AI computing firm Nvidia (Nasdaq:NVDA) and the Chicago-based biotech startup Evozyne have teamed up to develop an AI model for designing therapeutic proteins. Using NVIDIA’s BioNeMo language model service, the companies developed the Protein Transformer Variational AutoEncoder (ProT-VAE) model.
The initial focus of the Evozyne research team was on the PAH gene, which governs the production of the phenylalanine-hydroxylase enzyme. A mutation in the PAH gene has been linked to phenylketonuria, a rare condition characterized by elevated levels of phenylalanine that can lead to neurological problems.
With the goal of developing a novel treatment for phenylketonuria, ProT-VAE created new synthetic PAH variants that they predicted would have “supernatural” functionality. Lab tests eventually demonstrated that some of the protein variants were up to 2.5 times more effective than the natural PAH. These proteins have the potential to serve as gene therapies to substitute the missing digestive enzyme.
ChatGPT for proteins
The generative AI model that made the PAH breakthrough possible has some similarities to ChatGPT, according to Kimberly Powell, vice president of healthcare at Nvidia. “Biology, whether it’s a molecule, protein or DNA, can be represented in a sequence just like characters in a sentence,” Powell said. “We can apply these large language models, generative AI models, chatGPT like models to biology.
There are roughly 20,000 nonmodified human proteins. One of the reasons that AI is a hot topic in protein research is that it can be used to identify scores of novel proteins. “To give you a basis, there are more potential proteins than particles in the universe,” Powell said. “When you are in nature, as soon as something works, you just move onto the next thing instead of exploring all of the possibilities of things that could work.”
Given the sheer number of possibilities involved in protein synthesis, it becomes vital to have a clear focus. “You know at the get-go that there is a family of proteins with a given characteristic you care about,” Powell said.
By homing in on the PAH family of proteins, Evozyne could train a model based on that family’s characteristics and generate a large number of new proteins based on those. “These generated proteins could be used for a variety of purposes, such as developing new medicines or vaccines,” Powell said. “The model can synthesize proteins that have never existed in nature before and possess the same characteristics of the target family.”
The ProT-VAE framework can generate millions of sequences in a matter of seconds.
Evozyne and Nvidia summarized the research in a preprint on bioRxiv. The paper noted that the model could create a “preponderance of possible proteins” that provides “a large palette with which to discover novel proteins with ‘super-natural’ or even non-natural function.”
Tapping supercomputing resources
The cloud-based BioNeMo, which is based on Nvidia’s NeMo Megatron, can train and deploy large biomolecular transformer AI models using supercomputing resources.
“In order for you to take essentially the world’s knowledge — everything on the entire internet and have a model able to encode and learn it — it has to be gigantic,” Powell said. “NeMo Megatron is the tool that allows you to train these gigantic models at supercomputing scale.”
Evozyne believes the ProT-VAE framework highlights the potential of creating better-performing protein sequences than those found in nature and potentially developing novel therapies for previously untreatable diseases.
The analyst firm Gartner has reached similar conclusions regarding the use of AI in drug discovery and development. Gartner predicts that generative AI techniques will be used in the discovery of more than 30% of new drugs and materials by 2025.