lab microscope

[Photo by Chokniti Khongchum on Pexels]

In 2016, the FAIR Guiding Principles for scientific data management and stewardship were published, laying out guideposts for scholarly data producers to make their data discoverable and usable in the future. The FAIR principles seek to ensure that data is Findable, Accessible, Interoperable, and Reusable. At the time of their publication, they articulated and centralized many points of discussion among data scientists, and since then, they have been widely referenced and utilized in academia.  

Of course, collaboration and data reuse are fundamental to academic research. But the ideas of FAIRness can actually bring value to any research organization, including life sciences companies with vast archives of proprietary data. For example, a pharma company may expect that its data will always stay within its own walls, but its effective management and reuse can have dramatic benefits within the enterprise. By applying FAIR to leverage data assets already in their archives more efficiently, organizations can empower their R&D teams and drive faster innovation.   

Enabling machine learning

Much of the potential for innovation is in the realm of machine learning and the possibilities it offers for transforming R&D. But as anyone who’s had a role in digital transformation initiatives can tell you, the “transformation” required to effectively leverage machine learning is easier said than done. Enterprise-wide FAIRification can be a big part of the solution.

Recent data highlighted the fact that 80% of a data scientist’s time is spent on mundane data access and management tasks. In a machine learning effort for pharma, these tasks might include consolidating data from different locations, ensuring compliance with privacy and security regulations, uniting diverse data types, and standardizing and labeling it. These tasks are not the highest and best use of scientists’ time, and they’re unlikely to contribute to job satisfaction—something research organizations must keep in mind, given the shortage of data scientists. 

To make data FAIR at scale while at the same time reducing this “data wrangling” burden, pharma companies need tools for efficient data management and curation. That begins with a data infrastructure that can do the heavy work of automating data ingestion and curation and housing the data itself. Automation is a critical piece of the process, not only to reduce errors and inconsistencies that can be introduced by manual curation but to speed up the timeline and enable scientists to do the high-impact work of training their algorithms. 

This automated curation is possible for tabular data and complex data like imaging. Tools can unearth these assets from archives, then read and extract metadata to make them more easily discoverable within the enterprise. Medical imaging offers great value to R&D teams, with rich information in each asset. When properly managed, images can also be combined with related data like EHRs, radiology reports, and clinical reports, offering a wealth of information for researchers—if they have the tools to handle the scale and complexity of this diverse data.   

A single source of truth for data

Leveraging these concepts can help enterprises establish a single source of truth for data and metadata within the organization—providing researchers with a central repository from which they can easily reference data. Compare this concept with today’s typical practice for leveraging imaging data, in which a data scientist has to write a SQL script to submit to a CRO or an internal information security group, then wait weeks to receive their requested data. In contrast, a modern, enterprise-scale platform can put rich, diverse information at scientists’ fingertips while maintaining compliance and data security. 

Where can researchers go with terabytes or petabytes of data made easily accessible? More automation can be applied in analysis pipelines and data processing to accelerate clinical trials and streamline discovery. Internal “de-siloing” can foster more collaborations within the enterprise and help it reduce costs by leveraging existing assets. And external partnerships can be improved as well, with data that is more readily useful to outside organizations. 

The near future will bring unprecedented volumes of data within scientists’ reach, and in this landscape, collaboration and interoperability will be critical to success. The FAIR principles offer life sciences organizations a roadmap to succeeding in a more open environment and to unlocking greater value within their existing research assets. 

Jim Olson is CEO of Flywheel, a biomedical research informatics platform leveraging the power of cloud-scale computing infrastructure to address the increasing complexity of modern computational science and machine learning. Jim is a “builder” at his core. His passion is developing teams and growing companies. Jim has over 35 years of leadership experience in technology, digital product development, business strategy, high-growth companies, and healthcare at both large and startup companies, including West Publishing, now Thomson Reuters, Iconoculture, Livio Health Group and Stella/Blue Cross Blue Shield of Minnesota.