
[Adobe Stock]
As machine learning and deep learning technologies advance thanks to advances in computation, algorithms and data availability, the possibilities of the technology continue to expand in medicine. While these AI-driven approaches have real potential, such systems demand large volumes of representative data, careful privacy and security scrutiny and thoughtful long-term strategic planning. In this Q&A, Kathryn Rough, associate director of the Center for Advanced Evidence Generation at IQVIA, discusses the impact of deep learning on healthcare delivery and recommends steps to take during the design, training, evaluation and deployment phases to increase the likelihood that these models will be safe, effective and ethical when trained on real-world health data. Rough also explores the role of epidemiologists in evaluating these technologies as part of multidisciplinary teams and provides advice for healthcare researchers looking to stay current with developments in this rapidly evolving field.
Q: Can you discuss the potential impact of deep learning on healthcare delivery, including both the benefits and the challenges?
A: It can be useful to remember that machine learning is already being used in healthcare. For decades, tools like the Framingham Risk Score (or Pooled Cohort Equation) have been helping clinicians understand patient risk and guide clinical decision making about preventive interventions and treatment. These risk scores are often based on regression models that can be considered part of the family of traditional machine learning methods.

Dr. Kathryn Rough
Deep learning is a specific type of machine learning that is based on models called artificial neural networks. These neural networks are quite powerful; they can capture complex relationships within data that traditional machine learning methods lack the capacity to learn. This is why, in domains like natural language processing and image recognition, we’ve seen such dramatic advances in the past several years when we apply deep learning methods.
We’re already seeing deep learning-based medical devices adopted in clinical care. In 2018, the FDA approved its first AI-enabled device. Currently, nearly 700 AI-enabled devices have been approved by the FDA, and there’s substantial potential for deep learning to continue improving patient care. For example, it can create tools that improve diagnostic accuracy or to help identify patients who would benefit from certain treatment or prevention measures. There’s also potential to partially automate the clerical tasks that clinicians are currently responsible for. This would ease the administrative burden placed on care providers and allow them to focus more on the patients in front of them.
However, there are important challenges that deserve acknowledgement. Training neural networks typically requires much larger volumes of data, substantial computing power and specialized expertise. There are also privacy and data security considerations, particularly for applications involving protected patient health information. Machine learning algorithms can also perpetuate and exacerbate biases and discrimination present in the healthcare system. Practical issues can emerge in clinical settings, and model performance may degrade over time due to changes in medical practices or the patient population. Finally, deployment of models that have not been properly evaluated or validated may cause patient harm.
While all these challenges are addressable, they often require collaboration across disciplines and input from technical and clinical experts.
Q: What are some key considerations for ensuring the safety and efficacy of deep learning models in healthcare, particularly when trained using real-world healthcare data?
A: For deep learning models trained on real world health data, there are many steps we can take during the design, training, evaluation, and deployment phases to increase the likelihood they will be safe, effective, and ethical.
Design: Before we start building any tools with deep learning models, it is helpful to get input from a variety of stakeholders, ensuring the patient perspective is represented. Understanding current clinical workflows helps us build technologies that are helpful and will have higher likelihood of being adopted once deployed.
Understanding how a tool will be used in practice also helps us make good study design decisions. We should pay special attention to the label (or outcome) that we expect the deep learning model to learn. We want to avoid labels that are likely to be inaccurate or biased by current practices; there’s an illuminating example of how important this is in an article by Obermeyer and colleagues that was published in Science.
Training: The selection or curation of the dataset used to train the model is important. We want to ensure that this data has adequate representation of the populations the tool would be serving. If it’s not, we may want to explore ways to supplement the training data or consider applying fine-tuning or transfer learning methods. Examining data quality and completeness before training the model can also help us discover issues early and take appropriate actions to address them. When we train the model, it’s recommended to document all data pre-processing steps, model hyper-parameters, and any post-processing performed on model outputs.
Evaluation: Carefully consider which model performance metrics are most relevant to the specific tool being developed. For retrospective evaluations on existing data, performance should be measured in a held-out test set. Further evaluation of model performance on a dataset unrelated to the training set can clarify how the model will perform in new settings. User tests that do not impact patient care can be performed to understand how the tool is used, facilitating identification and resolution of issues that could impact efficacy before the tool is deployed. Plans should also be developed for assessment of model performance and clinical impact after deployment.
Deployment: Pilot testing or staggered deployments of the tool can enable iterations and updates to address any observed post-deployment issues. To notify the deployment team if model performance starts to degrade, pipelines can be built to automatically calculate key performance metrics and send alerts. Finally, the proactive collection of feedback from clinical users and patients can facilitate further improvements to the tool.
Q: How can epidemiologists and other healthcare researchers balance the need for innovation with rigorous evaluation and validation of deep learning models, especially in medical applications?
A: It is my view that having good evaluation strategies for deep learning technologies will enhance our ability to continue to innovate and improve; if we know what’s not working well, we can take steps to address or mitigate the issues. A good evaluation strategy is a key part of a learning healthcare system.
Epidemiologists and other healthcare researchers often have substantial training and experience in principled ways to measure the safety and efficacy of healthcare interventions. That’s a valuable skill that we can bring to the multi-disciplinary teams responsible for creating and deploying tools that use deep learning. By measuring how well tools are working, we can be stewards of responsible innovation.
Q: Do you find any specific deep learning research or application areas particularly relevant or promising for epidemiologists to explore, and why?
A: We should not expect most epidemiologists to become experts in deep learning, but I do recommend they have a basic understanding of these methods. As mentioned above, they can play important roles in creating and evaluating these new deep learning-based tools as part of multidisciplinary teams.
Q: Finally, what advice would you give healthcare researchers interested in deep learning and AI? How can they stay current with developments and develop the skills needed for collaboration or evaluation in this field?
A: The more understanding we have, the more we can critically analyze and meaningfully contribute to ongoing innovation in the field.
Along with my colleague Stylianos Serghiou, I recently co-authored an article, Deep Learning for Epidemiologists: An Introduction to Neural Networks, in the American Journal of Epidemiology that outlines the basics of deep learning methods in a way that is accessible for epidemiologists and healthcare researchers. For those interested in learning more about deep learning, there are a variety of freely available educational resources, including the deep learning specialization on Coursera. I also suggest actively engaging with the machine learning-related research and innovation taking place in your domain of expertise.
How would you summarize your advice in a nutshell for healthcare professionals?
Having a basic understanding of machine learning and deep learning methods can be valuable for a variety of healthcare professionals. By equipping ourselves with knowledge and fostering interdisciplinary collaboration, we can work to ensure that machine learning technologies benefit patients as they continue to transform the delivery of healthcare.
Kathryn Rough is associate director of the Center for Advanced Evidence Generation at IQVIA.