Genomics 2.0: Trusted research environments to manage 500M genomes

Introduction

Image courtesy of Pixabay

The study of genetics dates to the mid-19th century, from the works of Gregor Mendel, but it wasn’t until the second half of the 20th century that the field of genetics made great strides. The completion of the Human Genome Project and other significant technological advances were the driving factors for this significant impact. In light of these advances, more than 1,800 disease genes have been discovered, and more than 2000 genetic tests have become available to the public.

Advances in technology such as next-generation sequencing (NGS) have allowed scientists to perform experiments at a rate that was never possible before. NGS is a DNA sequencing technology that allows the whole genome of an individual to be sequenced within one day, producing a large amount of clinical health data. Unfortunately, maintaining small data sets in a centralized location for analysis (Genomics 1.0) is not keeping up with the demand for the size and complexity of today’s data sets.

Enter Genomics 2.0. In the new era of biomedical data accessibility, Genomics 2.0 uses a technology-driven, federated data approach that allows researchers to access, explore, collaborate and analyze distributed datasets without movement.

Precision medicine was born out of the realization that individual genomic data was key to identifying special treatments and therapies best suited for one’s unique genetic constitution. As the scale of precision medicine has expanded for research, however, so has the volume of clinical datasets.

The problem?

It is estimated that by 2025, more than 500M human genomes will be sequenced in a clinical environment.

Traditional data-sharing methods involve downloading large amounts of data onto one’s computer to analyze the clinical data. While this sounds like a simple approach, there is no control over what is done with the datasets once data leaves the organization. The level of patient confidentiality and security reduces.

Collaboration becomes limited with traditional methods due to strict regulatory and data privacy rules that vary from country to country. As these regulations preclude data from leaving the countries where it was gathered, it becomes unused and siloed. By some estimates, 80-90% of essential datasets are unavailable to the research community because of these restrictions.

With the influx of NGS techniques being performed – and the exponential growth in size and complexity of genomic data – current technology for health data management is no longer enough.

Traditional methods of health data management cannot keep up with demand. As the shift in technology begins to manifest in the life science industry, a new approach to health data management is being applied: trusted research environments (TREs).

The safe and secure solution

TREs are becoming the architectural structure for health data within the research field, especially genomics. A TRE is a centralized computing database that securely holds clinical health data without risking patient data confidentiality by never letting the data leave the organization where it is stored.

Researchers have to be appropriately trained and approved to have the credentials to access the clinical datasets within the appropriate TRE. By doing so, this path limits the possibility of patients being re-identified or unauthorized users gaining access.

While user accessibility is an essential factor in TREs, so is the quality and type of data used. Before researchers and scientists access data, the clinical data sets are cleaned, transformed to a common format (or Common Data Model, e.g., OMOP), and verified. TREs have built-in auditing to ensure compliance and verification that the information used positively benefits public health. In addition, researchers can bring in their tools to analyze the findings, making the platform user-friendly.

TREs ensure safe settings by having barriers (or “airlocks”) so that activity and transactions are tracked from both sides, ensuring that everything is secure, safe and approved.

Global impact

One step in the right direction for life-changing discoveries can come from one human mind, but what if you were able to incorporate ten, fifty or even one hundred great minds. Global collaboration among scientists and researchers would create a seismic shift within the pharma and biotech industry for the better good. TREs are enabling this global collaboration.

Breaking down the barriers of siloed data allows scientists to analyze and review findings from other colleagues, not just within their organization but other organizations around the globe. Increased access to global clinical data can reduce time spent in the lab, increase the speed of diagnosis and increase the development of new hypotheses in the research community.

Conclusion

As genomic health data continues to increase in diversity, scale and complexity, there are many challenges when it comes to storage, management, analysis and collaboration. TREs provide scientists and researchers with a safe, secure and collaborative platform that enables them to make life-changing scientific discoveries. In addition, TREs will bring innovation to patient health by assisting in the 500M genome goal.

Dr. Pablo Prieto Baja is the co-founder and CTO of Lifebit.