AI Has Solved the 50-Year-Old Problem of Protein Folding

December 10, 2020 - 7 minutes read

Many remember DeepMind as the parent company of the artificial intelligence (AI) application AlphaGo, the first program in the world to beat a professional Go player in October of 2015. Since then, the company has been working on several different research projects, including a hyper-ambitious one aimed at predicting how proteins will fold. Protein folding is paramount to understanding biology and the effects of protein on the body. But it’s incredibly difficult for humans to understand and predict.

But DeepMind’s latest version of AlphaFold has made waves in the field of biology. With AlphaFold, scientists say, the driving forces behind some diseases could be explained, and we could start creating designer medicine, crops with increased nutrition, and enzymes to break down plastic pollution. There’s a lot of potential for AlphaFold to change the world more than once.

The Challenge of Protein Folding

DeepMind had a novel idea when they first created AlphaGo using machine learning development techniques that self-corrected and self-taught for improved performance and continuous learning over time. The idea of developing a game algorithm wasn’t unique, but the holistic goal of using gameplay to provide a training ground for programs to eventually become good enough to be applied to real-world problems definitely turned heads. When the company announced they were going to try to tackle protein folding, many biology researchers thought it was a nice idea but probably impossible to optimize in their lifetimes.

For 50 years, the field of biology has shed a lot of sweat, blood, and tears over protein folding. It’s imperative in biological processes because proteins make up a large part of many biological pathways that explain metabolism and other cellular processes. A protein’s shape determines its function, and if the folding shape can be predicted ahead of time, synthesizing new proteins for specific uses would become easier and faster. For example, protein shape explains how antibodies fight diseases in the body and how insulin controls blood sugar levels.

Another issue that biologists face is that there are over 200 million known proteins, but we only know the folding structures for a small fraction of them. It can take years to elucidate the 3D structure of a folded protein using meticulous lab work, so there is not a lot of data for a computer science research team to work with.

Training AlphaFold

DeepMind researchers trained the AlphaFold algorithm on a large database comprised of 170,000 protein sequences and their 3D folded shapes. It took a few weeks for the training to complete, as the algorithm was running on about 100 to 200 graphics processing units. To test the algorithm’s performance, the team entered AlphaFold into Casp (Critical Assessment of Protein Structure Prediction), a biennial “protein olympics” competition.

During the competition, teams are given protein sequences for about 100 proteins to figure out the folding structure. Teams can either use computers or work out the folding structure using labwork. AlphaFold came out on top compared to the other computer program teams, and its accuracy rate was comparable to the labwork methods which are much more laborious and time-consuming.

Across all proteins analyzed by AlphaFold, the algorithm scored 92.5 out of 100 against other computer teams and 90 out of 100 against the labwork method. For more difficult proteins, the algorithm scored 87 out of 100. According to a researcher at London-headquartered DeepMind, John Jumper, the team wasn’t aware how far they’d pushed the field “until we saw the Casp.”

Applying AlphaFold

Demis Hassabis, DeepMind’s founder and chief executive, says the company has already begun working out the logistics of giving other researchers access to AlphaFold to help with their scientific research. DeepMind has said it would be dedicated to research that’s focused on sleeping sickness, leishmaniasis (a parasitic disease), and malaria. A handful of research groups have already used AlphaFold in their medical applications.

For example, according to the director of the Max Planck Institute for Developmental Biology in Germany, Andrei Lupas, the program has already been used to solve a protein structure that scientists had been working on for a decade. For Hassabis, these results mark “an exciting moment for the field” because the “algorithms are now becoming mature enough and powerful enough to be applicable to really challenging scientific problems.” Even for biology, bioengineering, and bioinformatics researchers who haven’t used AlphaFold yet, there is a lot of excitement and new ideas around the seemingly limitless applications of AlphaFold.

medical app developer

The director emeritus of EMBL’s European Bioinformatics Institute, Janet Thornton, was not involved in DeepMind’s work. Nevertheless, she was very thrilled to hear about AlphaFold’s performance stats at the Casp competition. She said, “This is a problem that I was beginning to think would not get solved in my lifetime.” Conquering protein folding, says Thornton, will “really help us to understand how human beings operate and function, how we work.”

The Future of Biology

The president of the Royal Society, Venki Ramakrishnan, has also sung praises for AlphaFold’s performance and DeepMind’s success, calling the work “a stunning advance” that surprisingly happened “decades before many people in the field would have predicted.” The potential of the algorithm to save lives, improve treatments, and invent new technologies will have a lasting effect on humanity.

This isn’t the end of AlphaFold. Protein folding can get extremely complex with the addition of more protein structures, so the research team still has a lot to tackle. But there’s hope yet that AlphaFold will continue ramping up its accuracy while taking on larger and larger protein sequences.

Tags: , , , , , , , , , , , , ,