In the U.S., the place where one was born, one’s social and economic background, the neighborhoods in which one spends one’s formative years, and where one grows old are factors that account for a quarter to 60% of deaths in any given year, partly because these forces play a significant role in occurrence and outcomes for heart disease, cancer, unintentional injuries, chronic lower respiratory diseases, and cerebrovascular diseases — the five leading causes of death.
While data on such “macro” factors is critical to tracking and predicting health outcomes for individuals and communities, analysts who apply machine-learning tools to health outcomes tend to rely on “micro” data constrained to purely clinical settings and driven by healthcare data and processes inside the hospital, leaving factors that could shed light on healthcare disparities in the dark.
Researchers at the NYU Tandon School of Engineering and NYU School of Global Public Health (NYU GPH), in a new perspective, “Machine learning and algorithmic fairness in public and population health,” in Nature Machine Intelligence, aim to activate the machine learning community to account for “macro” factors and their impact on health. Thinking outside the clinical “box” and beyond the strict limits of individual factors, Rumi Chunara, associate professor of computer science and engineering at NYU Tandon and of biostatistics at the NYU GPH, found a new approach to incorporating the larger web of relevant data for predictive modeling for individual and community health outcomes.
“Research of what causes and reduces equity shows that to avoid creating more disparities it is essential to consider upstream factors as well,” explained Chunara. She noted, on the one hand, the large body of work on AI and machine learning implementation in healthcare in areas like image analysis, radiography, and pathology, and on the other the strong awareness and advocacy focused on such areas as structural racism, police brutality, and healthcare disparities that came to light around the COVID-19 pandemic.
“Our goal is to take that work and the explosion of data-rich machine learning in healthcare, and create a holistic view beyond the clinical setting, incorporating data about communities and the environment.”
Chunara, along with her doctoral students Vishwali Mhasawade and Yuan Zhao, at NYU Tandon and NYU GPH, respectively, leveraged the Social Ecological Model, a framework for understanding how the health, habits and behavior of an individual are affected by factors such as public policies at the national and international level and availability of health resources within a community and neighborhood. The team shows how principles of this model can be used in algorithm development to show how algorithms can be designed and used more equitably.
The researchers organized existing work into a taxonomy of the types of tasks for which machine learning and AI are used that span prediction, interventions, identifying effects and allocations, to show examples of how a multi-level perspective can be leveraged. In the piece, the authors also show how the same framework is applicable to considerations of data privacy, governance, and best practices to move the healthcare burden from individuals, toward improving equity.
As an example of such approaches, members of the same team recently presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics and Society a new approach to using “causal multi-level fairness,” the larger web of relevant data for assessing fairness of algorithms. This work builds on the field of “algorithmic fairness,” which, to date, is limited by its exclusive focus on individual-level attributes such as gender and race.
In this work Mhasawade and Chunara formalized a novel approach to understanding fairness relationships using tools from causal inference, synthesizing a means by which an investigator could assess and account for effects of sensitive macro attributes and not merely individual factors. They developed the algorithm for their approach and provided the settings under which it is applicable. They also illustrated their method on data showing how predictions based merely on data points associated with labels like race, income and gender are of limited value if sensitive attributes are not accounted for, or are accounted for without proper context.
“As in healthcare, algorithmic fairness tends to be focused on labels — men and women, Black versus white, etc. — without considering multiple layers of influence from a causal perspective to decide what is fair and unfair in predictions,” said Chunara. “Our work presents a framework for thinking not only about equity in algorithms but also what types of data we use in them.”