Jensen Huang

Jensen Huang at GTC

In a packed panel discussion at GTC, moderated by NVIDIA Founder and CEO Jensen Huang, the architects of the groundbreaking transformer model gathered to explore their creation’s potential. The panel featured seven of the eight authors of the seminal “Attention Is All You Need Paper” paper, which introduced transformers – a type of neural network designed to handle sequential data, like text or time series, in a way that allows for much more parallel processing than previous architectures like recurrent neural networks (RNNs). Transformers accomplish this through a mechanism called “attention,” which enables the model to differentially weigh the importance of different parts of the input data.

The transformer architecture powers large language models like GPT-4 and has ignited widespread interest in AI applications across industries including in biology, where much data can be represented in long sequences. Transformers also play a supporting role in the recently introduced NVIDIA Inference Microservices (NIMs) and NVIDIA’s Blackwell GPU architecture, which is based on a second-generation transformer engine “purpose-built for accelerated computing and generative AI.”

Programming RNA molecules like biological systems

Jakob Uszkoreit, former senior staff software engineer at Google and co-founder/CEO of the techbio startup Inceptive, shared his personal motivation for exploring the potential of transformer models in biology. “In ’21, I cofounded Inceptive with the realization that there can be a much more direct impact on improving people’s lives with this technology,” Uszkoreit said. “My first child was born during the pandemic, which gave me a newfound appreciation for the fragility of life.”

Uszkoreit’s interest in applying transformers to molecular biology was further piqued by two key developments. First, the success of AlphaFold 2 in the CASP14 protein structure prediction competition. AlphaFold 2 incorporated transformer-based architectures, which contributed to its improved performance compared to its predecessor. “It became really clear that this stuff is ready for primetime in molecular biology,” Uszkoreit said.

Further strengthening that thesis was the release of mRNA COVID vaccine efficacy results. “It became very clear that you can do anything in life with RNA, but there was no data for the longest time. In a certain sense, it was the neglected stepchild of molecular biology,” Uszkoreit said. “It just seemed like almost a moral obligation. This has to happen.”

Uszkoreit envisions a future where RNA molecules are programmed like biological systems. “It starts as a program that you’ve compiled into something that could run on a GPU,” Uszkoreit said. “In our case, the life of a piece of biological software starts with specifying the desired behaviors, such as producing a specific protein in a cell at a certain level. Then, we learn how to translate that specification using deep learning into RNA molecules that, once in cells, exhibit those behaviors. The process goes beyond just translating, say, English into computer code; it also involves translating the specifications of medicines and data into actual molecules.”

Combining cutting-edge modeling with hands-on lab work

The process involves more than computation. “You have to run experimentation against nature,” Uszkoreit said. “You really have to verify this because the data does not yet exist.” But there are “a ton of extremely valuable genomic data that you can download, largely available openly and publicly because it’s generally still largely publicly funded,” he continued. To address this, Uszkoreit’s team combines cutting-edge modeling with hands-on lab work to generate data “tailored to the specific phenomena you’re trying to model.” This is crucial in areas like codon expression for mRNA vaccines, where they are breaking new ground. Their team blends machine learning experts and traditional biologists – a blend of robots and lab coats. “We think of ourselves as pioneers of something new,” Uszkoreit said.

Transformers finding use in drug discovery

Uszkoreit’s vision is just one example of how the transformer architecture is finding use in drug discovery. A growing number of other organizations are also experimenting with transformers to transform drug discovery Exscientia, a UK-based AI drug discovery company, is developing transformers to automate retrosynthesis and guide the synthesis of new drug molecules. Similarly, Insilico Medicine’s Chemistry42 platform integrates transformers with other generative approaches to design novel compounds for drug targets.In addition, AstraZeneca trained a version of its MolBART transformer model on a large database of chemical compounds using NVIDIA’s Megatron framework for training large language models. The aim is for these models to study relationships between atoms in molecules, similar to how language models learn relationships between words, to aid in drug discovery

Academic researchers, invoking the “Attention is all you need” tagline, are also exploring a variety of transformer architectures to predict drug-drug interactions, cancer drug sensitivity and protein-ligand affinity.

Overcoming challenges of graph transformer models

Other efforts are aiming to explore techniques that overcome challenges associated with graph transformer models. One key challenge is distinguishing between isomorphic (structurally identical) graphs, which is crucial for accurately representing and reasoning about molecular structures. Demis Hassabis, the CEO and co-founder of DeepMind, alluded to this challenge when introducing Isomorphic Labs, an autonomous subsidiary of Alphabet spun off from DeepMind. Isomorphic Labs focuses on AI-enabled drug discovery and aims to develop techniques that can capture the complex, messy nature of biology.

“Biology is likely far too complex and messy to ever be encapsulated as a simple set of neat mathematical equations. But just as mathematics turned out to be the right description language for physics, biology may turn out to be the perfect type of regime for the application of AI.”

The panelists also discussed the future of reasoning in AI systems. Llion Jones highlighted that the next frontier is to enable AI to reason more powerfully by learning and searching for the right architectures rather than relying on hand-engineered approaches. “I think the next big thing that’s coming is reasoning, but I think a lot of people don’t realize this and a lot of people are working on it,” Jones said. He emphasized the need to explore the space of possible architectures and learn how to wire them together to achieve more powerful reasoning capabilities.”