artificial intelligence

[Photo by Tara Winstead on Pexels]

Clinical trials form the cornerstone of evidence-based medicine and are essential to establishing the safety and efficacy of new drugs. However, only some of the information in clinical trial reports is well-structured and searchable via keywords; much of the information is buried in unstructured text.

In the past, uncovering actionable insights from this unstructured text meant that documents such as clinical trial reports had to be searched and read individually, a process that can be time-consuming and subject to human error. It is estimated that 80% of clinical data is unstructured and difficult to analyze.

To overcome these limitations, life sciences companies are using natural language processing (NLP), an artificial intelligence-based technology that extracts and synthesizes high-value information hidden in unstructured text. NLP-based text mining solutions can analyze unstructured text to identify the key facts, interpret the meaning, and extract and present facts in a structured form for researchers to review, analyze and summarize.

NLP derives insights from unstructured text in clinical trial reports, enabling researchers to answer critical questions such as:

  • What clinical endpoints would be appropriate to measure for a particular disease?
  • Which investigators are expert in running clinical trials for a particular condition?
  • Who else has drugs in clinical trials for a specific indication?
  • What potential due diligence information is available for in-licensing opportunities for a particular disease area?

NLP offers notable value in assisting with the design of clinical trials, delivering actionable information about worldwide clinical development activities that enables researchers to select trial sites more effectively, find precedents for study design, understand inclusion/exclusion criteria used for different indications, and search for principal investigators more effectively.

Similarly, life sciences companies are using NLP to accelerate the collection of competitive intelligence, helping researchers monitor the status or progress of known competitors’ trials, evaluate a competitor’s trial design, verify trial sites of competing trials and understand which other companies are running clinical trials in the same therapeutic area.

Streamlining clinical trial protocols with AI

Every clinical investigation begins with developing a clinical trial protocol, which documents how a clinical trial will be conducted and ensures the safety of the trial subjects and the integrity of the data collected. Clinical protocols include detailed, valuable information, such as a trial’s title, purpose, intervention, number of arms, outcome measures and patient criteria.

However, creating a protocol is a complex undertaking that the pharmaceutical industry would like to streamline. This can be achieved through greater consistency across trials to reduce the number of protocol amendments for a given trial. A report from the Tufts Center for the Study of Drug Development quantified the impact that protocol amendments have on trials.

The study found that 57% of all protocols, across all phases, have at least one substantial global amendment. Phase II protocols have the highest incidence of substantial amendments (77%), averaging 2.2 amendments per protocol.

The most frequent changes stemming from amendments are associated with modifications and revisions to study volunteer demographics and eligibility criteria. Protocols with even one amendment experience a substantially lower actual number of patients screened and enrolled relative to plan compared to protocols without any amendments.

Study durations for protocols with at least one substantial amendment take, on average, three months longer. Finally, the total median direct cost to implement a substantial amendment for Phase II and Phase III protocols is $141,000 and $535,000, respectively.

NLP can improve the process by extracting key data from each protocol, providing important metadata and a standardized protocol document. This creates a digital structured content library, which allows for advanced searching, as well as the ability to associate content and add metadata. As a result of the digital library, protocol content becomes accessible and searchable across internal teams and workflows, improving institutional knowledge and digital data flow.

There are several advantages associated with digitizing trial protocols. For example, experts can search and sort protocols by metadata specific to protocol document, protocol content, or protocol learnings, and also gain a richer ability to identify protocol benchmarks and compare across protocols.

Three real-world examples of NLP in clinical trials

Following are three real-world examples of how life sciences companies today are using NLP to improve clinical trial design, site selection and competitive intelligence for top 10 pharma companies. In all these use cases, NLP technology-enabled organizations to mine their own data, which was not used outside their organization.

Trial design and optimization

Because clinical trials are so costly, speed-to-market is vital to the bottom line and to improve patients’ lives. Technologies that optimize trial design are enablers of increased speed-to-market.

NLP platforms accelerate design by helping clinical researchers pull out precise information, numeric and otherwise, from trial reports. For example, inclusion and exclusion biomarker metrics can be standardized, meaning a precise, semantically normalized, structured data output rather than unstructured text that is hard to research. Improving the efficiency and effectiveness of drug trial planning in this way reduces the cost of the whole trial and has the potential to minimize the time before the drug reaches the market.

Recently, one top 10 pharma company was seeking a scalable solution to extract summary statistics and outcome metrics, such as weight change and median survival time, from tabular or textual results sets from oncology and diabetes trial reports in clinical trial databases such as ClinicalTrials.gov and TrialTrove. Finding such a solution would enable the company to better understand how to design trials for oncology and diabetes drugs and where to aim future clinical trials. Ultimately, it would allow the organization to save time and money and, potentially, contribute to better patient outcomes.

An NLP platform makes it easy for researchers to access essential information regarding clinical endpoints. The solution enables the company to extract efficacy endpoints in summary statistics. Without NLP, clinical statistics researchers were forced to analyze databases and clinical journals, then copy and paste relevant data into Excel. The process was intensive, repetitive and boring for Ph.D.-level statisticians. It was also prone to error because pasting data into the wrong cell as the spreadsheet grows is a fairly typical miscue.

Since adoption, the NLP platform’s interactive information extraction tools have improved the organization’s search results by increasing recall and precision, integrating key information from different clinical trial sources, and ensuring future data could be easily captured through the periodic running of the queries.

For example, the company’s competitive intelligence clinical group needed the landscape of Phase 1 and 2 clinical trials that were testing two or three drugs in combination for autoimmune diseases. Manually, they had found only seven and had decided this was too effort-intensive. However, with a simple search with NLP technology, researchers could quickly find more than 300.

NLP text mining has reduced costs, time and errors by speeding research. Clinical statisticians can now automatically access and analyze current trusted data, while health outcome researchers can rapidly answer questions from payers.

Site selection

The experimental medicine division at another top 10 pharma organization needed to locate a clinical trial site for gastric bypass trials with the ability to measure gut peptides before and after surgery. The company sought a trial site that filled various needs, such as the ability to measure obesity- or diabetes-related biomarkers) and undertake specific medical procedures (e.g., bariatric surgery).

Because manual search and curation of sources such as ClinicalTrials.gov are slow and inefficient, the company used NLP to extract terms around gastric bypass, biomarkers, gut peptides, diabetes and obesity, and other key concepts.

Researchers extracted this data into a structured summary table, which enabled systematic and rapid review of the possible sites. To assess scientific rank for the investigators, the author and/or the trial ID were extracted from MEDLINE, and results combined.

The search yielded three ideal trial sites, one of which was previously unknown to the experimental medicine division. By integrating searches across CT.gov and Medline, NLP helped the organization reduce the time burden for clinical trial site planning and selection, decreasing overall cost.

Competitive intelligence

A third top 10 pharma company needed to effectively extract safety and efficacy data from clinical trials on drugs in development or existing drugs tested for new indications. However, much of the key information lies in unstructured text, meaning documents must be read individually – a process that can be time-consuming and subject to human error.

The organization’s information scientists leveraged NLP to gain valuable insights from large volumes of text. The company licensed curated data in XML format for over 120,000 individual drug trials in 180 disease areas, derived from various information sources. They used NLP to index and mine this data extracting relevant facts, relationships and numeric information into structured tables, charts and network graphs. The indexed data is automatically updated regularly.

Indexing was configured to recognize important sections of the documents, for example, the safety, trial design and technical results sections, enabling querying to be more targeted. In addition, ontologies such as MedDRA and custom vocabularies were used to define relevant query terms, including drug names, synonyms and phrases with equivalent meaning.

First, researchers identified the blind status of trials with differing intravenous (IV) drug doses, and next, they examined the dose durations of follow-on clinical trials. Then, the company performed an investigation to differentiate between trials that had maintained double-blind status with differing IV drug doses and those that were unable to maintain this status.

For example, the company performed one query of unstructured data that enabled it to find studies where a single drug was given intravenously in at least two trial arms at two different doses, and the drug in question was a comparator, and the trial was double-blinded. Another query identified Phase I multiple ascending dose studies by querying for specific durations and study types within the text’s title and treatment plan regions.

Overall, NLP-assisted automated text mining was essential to the company’s ability to extract and synthesize high-value information found only in the unstructured text regions of the trial reports. The technology enabled information scientists to ask precise questions of the clinical trial reports, delivering insights to clinical decision-makers much faster than previously possible and including information that was previously difficult to find or even unobtainable, and enabling better clinical decision support.

Clinical trial reports hold a wealth of unstructured data that holds the potential to deliver significant value to life sciences companies in the form of better trial design, site selection and competitive intelligence. However, unlocking that data into an asset that is suitable for analysis is no easy task. AI-assisted technologies such as natural language processing are essential tools that enable life sciences companies to harness the full power of data in clinical trial reports.

Jane Reed

Jane Reed

Jane Reed is director, life sciences at Linguamatics, an IQVIA company. She is responsible for developing the strategic vision for Linguamatics’ product portfolio and business development for the pharma and biotech market. Jane has extensive experience in life sciences informatics. She has worked for more than 20 years in vendor companies supplying data products, data integration and analysis and consultancy to pharma and biotech — with roles at Instem, BioWisdom, Incyte and Hexagen. Before moving into the life science industry, Jane worked in academia with post-doctoral positions in genetics and genomics research