Illustrative representation of a molecular structure with interconnected spheres and rods, symbolizing atoms and bonds, in a stylized depiction."

Conceptual visualization of a molecular structure, not representative of a specific molecule. [Image courtesy of Adobe Stock]

During the COVID-19 pandemic in late 2020, the storied Silicon Valley institution SRI International secured a $4.3 million DARPA contract to develop a tool for generating therapeutic small molecules to combat biological threats. Not just known for innovations like the computer mouse and Siri, SRI International is also responsible for Synfini, a multimodal chemistry model akin to large language models like ChatGPT. In September 2023, the company spun out Synfini as an independent entity.

“The core group and project started about seven years ago at SRI International, and began as part of the DARPA Make-It program,” said Peter Madrid, co-founder and head of scientific development at Synfini. DARPA’s Make-It initiative aimed to automate small molecule discovery and synthesis. Thanks to the support, Synfini developed AI-based approaches to plan and optimize the production of synthetic molecules.

Through the work with DARPA, SRI International developed several key technologies. One was a synthetic planning tool called SynRoute, which uses a large database of chemistry data combined with AI to come up with the reaction steps and a strategic plan for synthesizing molecules. Another was AutoSyn, a flow chemistry hardware platform for reliable multi-step synthesis, that successfully produced grams of materials. The technology suite also includes SynBuild and SynPlan for efficient molecular design and synthesis planning, and SynDB, a comprehensive and evolving chemistry data repository at the core of their system.

Peter Madrid

Peter Madrid

Before Synfini could spread its wings as an independent entity, its technology had already attracted the attention of two Big Pharma giants, Sanofi and a J&J subsidiary. In 2021, Synfini entered into a research collaboration with Sanofi to tap the Synfini platform in the discovery and development of candidates across multiple high-profile drug-discovery programs. A year later, it inked a deal with Janssen Pharmaceutica NV with the aim of tapping the AI-guided, automated synthetic chemistry system for small molecule drug discovery.

Crouching data, hidden patterns

In the near future, imagine a chemist instructing a generative AI lab automation platform, “I want a drug candidate that binds to target A with potency X but does not bind to target B.” The seemingly straightforward request launches a sophisticated process to explore the molecular landscape, seeking compounds with the specified characteristics. The result saves the chemist days or hours of work.

Nathan Collins

Nathan Collins

“Large language models have generated incredible excitement in the AI field, primarily because you have this wealth of data available through the internet and texts throughout history,” said Nathan Collins, head of strategic alliances and development at Synfini. “The concept with large chemistry models is to use similar models but incorporate different modalities, such as the fairly limited sets of chemistry data available, and methods to rapidly and automatically generate data to fill those holes.” This approach aims to provide more data for improved AI models and better predictions on the next round of chemistry.

If the overarching goal is getting a drug to the clinic, breaking that process into bite-sized chunks can help drive efficiency. “I think AI has the ability to fill the stop gap between finding hits against targets and getting to preclinical candidates,” Collins said.

The goal is to streamline scientific workflows. “For AI to be genuinely effective and useful, it must assist scientists in a hands-on and interactive way in a way that they understand, enhancing their ingenuity rather than attempting to replace them,” Collins said. “Ultimately, our approach is a chemistry-first approach, not an AI-first approach,” Madrid said. The company’s technology touts a “chemist-first” user interface, which is designed to provide an intuitive and collaborative environment  to drive efficient scientific exploration.

Enhancing efficiency in drug discovery with AI

The benefits of the system need to be apparent almost immediately. “It has to be something that delights them in terms of feedback on the tasks they have right in front of them,” Collins said.

The initial stages of drug discovery are traditionally tedious, involving hit identification, lead optimization, and preclinical testing. These phases require meticulous laboratory work and extensive data analysis to refine potential drug candidates before advancing to clinical trials. “Our goal is to use AI to bridge this gap effectively, in a way that augments the work of professionals in the field, not trying to replace them,” Collins said.

Many existing AI tools have been trying to get past the way people work. Our approach, with Peter and myself bringing our chemistry perspective and our experience at SRI, working with experts in this field, is to create tools that leverage us to get things done in ways that the pharmaceutical industry can understand and show that there’s a real advantage to it.

Madrid said the company’s technology, much like LLMs, can be “really useful as long as the person receiving the results can judge their correctness.” A chemist might look at an AI-proposed solution and dismiss it as absurd in a second. But through a few rapid iterations, the researcher might continue exploring and find some powerful ideas. “Many AI solutions currently take days or even a week to generate solutions,” Madrid said. “If a chemist is going to dismiss many of those, it’s not really improving efficiency. So, I believe the interactivity and the timeliness of these tools are critical aspects that are not addressed by many of the current solutions.”

Nuts and bolts of a large chemistry model

Synfini’s primary AI is a pre-trained transformer model, similar to many of the popular LLM models. “One unique aspect though is the neuro-symbolic AI, where we can code medicinal chemistry concepts on shape and drug interactions with a protein receptor, and measure those within our model,” Madrid said. The company uses this concept to rapidly filter out many of the false designs that a traditional generative AI might produce. “This addresses the challenge where generative AI models can sometimes be overly creative, suggesting designs that medicinal chemists would instantly dismiss and not want to make in the lab,” he added. The company can incorporate logical assertions about the structures into its model as part of the generative process.

Madrid notes that the approach can create a sub-neural network that measures a feature of a compound, typically involving how it interacts with its receptor, or it could be a physical property. “Along with predictions for just the pure biological activity, it will predict whether or not it’s going to interact in the way that the medicinal chemist wants it to,” Madrid said. “That’s part of the overall scoring and function for prioritizing the candidates.”

Synfini taps embeddings and ML to enhance drug synthesis capabilities

Madrid elaborated on how the system uses embeddings, a machine learning technique in which complex data types are mapped into a defined space of continuous vectors. Vectors, in this context, are essentially a list of numbers that can be used to represent various characteristics of the data. “We actually use a 3D embedding space, which also allows us to capture parts of structure that 2D methods just can’t but each point in that 3D volume, it’s kind of a voxel and it’s represented as a vector.” Because vectors are often long strings of numbers, they can translate to substantial file size. “The datasets are huge,” Madrid said. The company has spent considerable time through trial and error determining which vector sizes are optimal.

The company also has developed a machine learning algorithm for predicting the synthesizability of molecules. “This is something else we’re able to incorporate into this process,” Collins said.” So, we’re not only generating molecules, but also generating molecules with predetermined properties that our medicinal chemists would like to see. Further, we’re able to incorporate synthesizability into that, again, using AI tools based on data generated with our system. This ensures that we can actually make these molecules on our automated platforms.”