A vision for data-driven drug development in oncology

AI-Powered Drug Discovery: A virtual representation of molecules being screened by AI algorithms for potential drug candidates

[Adobe Stock]

When Peter Carr, principal software architect of Lantern Pharma, stepped into his full-time role in September 2020, the company was on the cusp of a transformation. While AI had been a focus for a number of years, a fresh infusion of cash provided a possibility of expanding its AI capabilities and machine learning capabilities to drive down the cost of drug development in oncology.

Founded in 2012, the company went public in June 2020, raising $26 million. By the time he officially joined, Carr was already familiar with its operations, having previously worked as a consultant in 2019 to help set up the infrastructure. Carr joined full-time to help the company “expand their use of AI and machine learning for target discovery and patient stratification,” he recalled.

The challenge: Siloed research

While the company had experience in using AI tools to slash the costs needed to get oncology drugs to the clinic, Carr noticed there was room for improvement in terms of how the company’s scientific teams collaborate. “One thing scientists don’t normally think about is how they can produce outputs that are usable by the company both next week, next month, and for years down the road,” he said.

While Lantern had already focused on using AI and machine learning in cancer research for several years, “it was done in silos by individual researchers,” Carr said. “The main push I made was to institutionalize knowledge because if a researcher’s laptop disappears, we lose all of that knowledge.” Carr also underscored the importance of a collaborative approach, with version control and findable datasets.

Piecing together a strategy

To facilitate seamless collaboration, Carr tapped AWS to support the collaborative environment in the cloud, relying on Docker containers and Git, to facilitate sharing of work. Docker containers offer a standardized environment for running applications, while Git allows for version control and code collaboration. But a significant context switch from scientific research to software engineering remained.

In a prior position, Carr served as a senior software engineer at the Broad Institute, which used GenePattern, an open-source tool for reproducible research in genomics. “When I came to this company, I thought, ‘Oh, we need something like that,’” Carr recalled. “But GenePattern is not a commercial product, so it lacks a bit of polish.” This led Carr to explore other options, ultimately landing on Code Ocean as the best fit for Lantern Pharma.

The impact: Improved efficiency and collaboration among scientific teams

The Code Ocean platform was designed to streamline collaboration and ensure reproducibility in computational experiments. It offers a collaborative research experience with an integrated development environment, a secure repository, and a technology known as the “Compute Capsule,” which encapsulates the computing environment, code, and data ensuring that the computational experiments are self-contained, reusable, and reproducible. Code Ocean also helps reduce the barriers associated with big data analysis as it facilitates large-scale, high-throughput collaborative research.

After implementing Code Ocean, Lantern Pharma was able to allow researchers to use tools they were already familiar with, such as RStudio and Jupyter Notebooks, in a seamless environment. Such uniformity played a role in promoting efficiency and collaboration among its scientific teams. “Similarly, as an engineer, I can use tools I’m familiar with, such as the Bash command line, all in the same environment,” Carr said. “This was a huge win in rapidly improving efficiency, and I give credit to the RADR team in combination with the Code Ocean-supplied infrastructure and UI.”

RADR is Lantern Pharma’s proprietary AI platform, which the company uses for discovering biomarker signatures. RADR makes use of more than 25 billion oncology-focused data points and a library of more than 200 advanced ML algorithms to address problems in oncology drug development. The signatures within RADR help the company identify patients most likely to respond to the genomically-targeted therapeutics that the company develops. Through the analysis of genomic profiles, the RADR platform aims to target drugs to patients who have the highest likelihood of benefiting from them.

The synergy of RADR and Code Ocean’s tech

Lantern Pharma’s recent drug programs have moved from initial AI-driven analyses of genomic, proteomic, and clinical data to first-in-human clinical trials in two to three years, at a cost of approximately $1 to 2 million per program — a fraction of traditional drug development benchmarks.

“A lot of this improvement came from building reusable pipelines to solve problems,” Carr said.  One of the company’s core components is building predictive models to identify cancer targets. “With the RADR team, we developed pipelines that allowed us to build 1000 models overnight, rank them, and find the 10 best-performing ones,” Carr said. The RADR scientific team doesn’t just develop a black-box predictive model, but one that is explainable. “This helped us identify potential mechanisms of action or pathways of interest, which fed into our feedback loop: data produces models, models produce insights, and insights produce new experiments in the lab,” Carr said.

As the company aims to build a new, massively scaled pipeline by modeling directly from chemical compounds and identifying novel features, it has built a predictive model to identify from the Simplified Molecular Input Line Entry System (SMILES) structure whether a given compound is likely to cross the blood-brain barrier. “We have a pretty good model,” Carr said. “But we have our special sauce to make it better, which requires maybe 20 times more data.” To do so, the company used Code Ocean’s Pipelines tool. “They built the UI for us, and we were able to write the code to design the pipeline,” Carr said.

“Code Ocean‘s Pipelines can easily handle hundreds or thousands of concurrent jobs,” said Iain Buchanan, customer success lead at the company. The company has also created  functionality related to topics such as data management to make it simpler to ensure data remains useful.

Lantern Pharma’s AI-driven strategy is built on collaboration

While the topic of AI in drug discovery and development continues to garner significant attention, the technology alone is just part of the equation. “The biggest win at Lantern is the pipeline we developed, and the value of Code Ocean was that it allowed the whole team to work together,” Carr said. Promoting collaboration was not a problem that could be solved by a single individual with a laptop. “Instead, we were able to have a shared environment for collaboration, which was a huge win that Code Ocean enabled us to achieve,” Carr said.

In addition, the technology makes it easier to onboard new employees. “No matter where we are in our tech stack, when we onboard a new member of the team, whether it’s a scientist or bioinformatician or software engineer, I set it up so they’re producing value within two weeks,” Carr said. “So the other thing that Code Ocean has done for us is it enables a new scientist on our team to get up to speed very quickly.”

What’s next

Lantern Pharma breaks down the process of data prep and curation into checkpoints: acquire the raw data, make it findable and curate the data into a machine learning-ready data asset. The actual curation process, which involves looking at the files and figuring out how to harmonize data, is not necessarily straightforward. “In our world, it’s very difficult to even know what something is a measurement of. Is it a measurement of RNA expression? Is it from a single-cell experiment? Or is it coming from a collection that spans 10 years?” Carr asked.

While answering such questions remains a challenge, Lantern Pharma has developed a roadmap for improving data curation and management. One overarching goal is to treat data as a valuable product that scientists and researchers can easily access and use. “Now, we have defined our goal: to treat data as a product,” Carr said.

Weaving a data mesh, a decentralized approach for data accessibility

One philosophy that aligns with this goal is the concept of a data mesh, a term author Zhamak Dehghani coined in 2019. A data mesh refers to a decentralized approach to data architecture and platform thinking that handles data as a product, with cross-functional teams in charge of their respective data domains. The philosophy intends to improve the scalability and efficiency of data platforms to make data more accessible and usable for different parts of the organization.

The company estimates that its AI-driven pipeline of product candidates has a combined annual market potential of more than $15 billion. The company has also established a subsidiary known as Starlight Therapeutics to focus exclusively on the clinical execution of Lantern Pharma’s therapies for CNS and brain cancers. Data will play a pivotal role for both Lantern Pharma and Starlight, acting as the foundation for their development activities.

“This whole data mesh philosophy makes sense,” Carr said. While acknowledging that the concept remains somewhat theoretical, Carr appreciates the idea that the framework could facilitate data access. “If we organize the data well, then the scientists can find data in a self-service way and do data curation without a lot of wrangling,” he said.