Data science

Image courtesy of Arcadia

The COVID-19 pandemic has brought into sharp focus the racioethnic and socioeconomic disparities inherent in the U.S. healthcare system. These disparities take the form of increased adverse health outcomes and reduced quality of life for affected groups.

For example, a study of cities that reported COVID-19 deaths by race and ethnicity found that 34% of deaths were among non-Hispanic Black people. This group accounts for just 12% of the total U.S. population, according to the U.S. Centers for Disease Control and Prevention (CDC), citing “long-standing systemic health and social inequities” among the reasons for the racial and ethnic disparities in COVID-19 deaths.

This heightened awareness around inequities and disparities in healthcare has also resulted in some much-needed attention to similar bias-related problems in the growing sector of healthcare artificial intelligence (AI). As an increasing number of healthcare organizations gain the opportunity to leverage more comprehensive data and improved predictive tools to enhance their patient risk identification and clinical decision-making, they also become more susceptible to the effects of implicit bias. Such bias is a natural consequence of intelligence that can, if unchecked, exacerbate the very health inequities that they were designed to relieve.

Why implicit bias in data science matters

Implicit bias, or the tendency to pass through thoughts confirming or conforming to stereotypes, is a natural condition of human cognition. Famously articulated by the Harvard Implicit Association Test, implicit bias reflects our tendency to use pre-existing knowledge of patterns and types to validate and characterize new information, even when those patterns are rooted in biased social or cultural perceptions.

Critically, AI is subject to the same kinds of implicit biases; after all, AI ultimately represents advanced pattern matching systems, similar to how our minds attempt to derive insights through patterns.

Ziad Obermayer and colleagues made this connection more concrete in a study published in the journal Science in 2019. Obermeyer’s team studied a predictive algorithm widely used by healthcare organizations to stratify patients by risk of future health needs for potential care management enrollment. Assessing that need is complicated, so the developers used cost as a proxy for need.

The problem with this approach is that spending patterns differ substantially by race, ethnicity, and socioeconomic status. Using machine learning to train a tool to identify need resulted in an underestimation of the future needs of Black patients. Since access to care may rely on algorithms like this, implicit bias in those outputs can make it more difficult for these patients to obtain care. Researchers reported that they found evidence of “racial bias” in the algorithm output that translated into a reduction by more than half of the number of Black patients identified for extra care.

Mitigating bias: 3 actionable steps

By their nature, predictive algorithms bring the possibility of perpetuating old biases or perhaps even introducing new ones into clinical and population health decision-making. However, by keeping the following three principles in mind, we believe data scientists can lessen bias and promote greater health equity in the predictive algorithms they develop:

Define the affected population and use rich, longitudinal data to match: Predictive algorithms can help clinicians make better, more cost-effective decisions more quickly, but they must be based on data representing the targeted patient population. Ideally, models would be trained on an extremely rich dataset with broad ethnographic coverage, including race, ethnicity, sex, geography, and socioeconomic status. However, that’s not always possible. So at least ensure that the covered population is known and limitations understood.

Select model outcomes that are universally accessible and applicable or unavoidable: As Obermeyer’s analysis showed, once a model has been trained on a biased outcome, biased outcomes are inevitable. For example, the total cost of care is often chosen as a proxy for adverse health utilization (and a direct signal of likely ROI). Since the historical cost of care is one of the best predictors of future cost of care, historical cost can easily become a primary driver for deciding which patients get additional care. It also potentially overlooks a broad set of individuals who don’t use the healthcare system in a “traditional” way.

In contrast, model outcomes that are broadly accessible and applicable or unavoidable reduce the likelihood of learned implicit bias. A model trained to predict unplanned or emergent inpatient events covers a much broader group of individuals than one trained on all inpatient admits (including costly elective surgeries) and has the added benefit that an intervention may better impact the events being predicted.

Apply a critical eye to algorithmic outputs: You can’t undo model bias by tweaking the outputs to make it “fairer,” but you can make bias mitigation and outcome equity focus areas in your model validation process. Ethnographic parity is an easy start; if the proportions of different racial, ethnic, and other demographic groups are wildly different in model outputs compared to your patient population, it’s not unreasonable to pause and ask why. It doesn’t mean such differences are “wrong” (or that perfect parity would be “right”). But it does prompt the question of whether strong differences have a biological or operational reason, as opposed to a source bias.

Michael Simon

Michael Simon, Ph.D.

Design for effective, actionable, equitable predictive tools from the start

Given the recent focus on inequities and disparities in the health system, healthcare data scientists must incorporate a bias mitigation strategy into their development process. With an intentional and thoughtful perspective on what makes predictive analytics useful, effective, and fair, healthcare AI has a strong future to change the way patients experience care and the way providers deliver it for the better.

Michael A. Simon, Ph.D., is the Director of Data Science at Arcadia, the leading population health management and health intelligence platform. Arcadia transforms data from disparate sources into targeted insights, putting them in the decision-making workflow to improve lives and outcomes.