QMIA: Quantifying and Mitigating Bias affecting and induced by AI in Medicine

Lead Research Organisation: University College London
Department Name: Institute of Health Informatics

Abstract

Artificial Intelligence (AI) has demonstrated exciting potential in improving healthcare. However, these technologies come with a big caveat. They do not work effectively for minority groups. A recent study published in Science shows a widely used AI tool in the US concludes Black patients are healthier than equally sick Whites. Using this tool, a health system would favour White people when allocating resources, such as hospital beds. AI models like this would do more harm than good for health equity. Such inequality goes way beyond racial groups, affecting people with different gender, age and socioeconomics background. Such AI induced bias might come from healthcare data, which significantly lacks data on minorities and embeds decades of health care disparities among different groups of people. The COVID-19 pandemic highlighted this issue, with UK minority groups disproportionately affected by higher infection rates and worse outcomes. Bias may also arise in the design and development of AI tools, where inequalities can be built into the decisions they make, including how to characterise patients and what to predict. For example, the above-mentioned AI tool in the US uses health costs as a proxy for health needs, making its predictions reflect economic inequality as much as care requirements, further perpetuating racial disparities.

However, currently, AI models in medicine are still only measured by accuracy, leaving their impact on inequalities untested. Current AI audit tools are not fit for purpose as they do not detect and quantity bias based on actual health needs. Largely absent are effective tools devised particularly for healthcare for evaluating and mitigating AI induced inequalities. This project aims to develop a set of tools for optimising health datasets and supporting AI development in ensuring equity. Central to the solution is a novel measurement tool for quantifying health inequalities: deterioration-allocation area under curve. This framework assess the fairness by checking whether the AI allocate the same level of resources for people with the same health needs across different groups. We will use three representative health datasets: (1) CVD-COVID-UK, containing person-level data of 57 million people in England; (2) SCI-Diabetes, a diabetes research cohort containing everyone with diabetes in Scotland; (3) UCLH dataset, routine secondary care data from University College London Hospitals NHS Foundation Trust. COVID-19 and Type 2 diabetes will be used as exemplar diseases for investigations. Specifically, this project will conduct three lines of work:

1. Analyse the embedded racial bias in all three heath datasets so AI developers can make informed decisions and selections on how to characterise patients and what to predict;
2. Systematically review and analyse risk prediction models, particularly those widely used in clinical settings, for COVID-19 and type 2 diabetes;
3. Develop a novel method called multi-objective ensemble to bring insights from complementary datasets (avoiding actual data transfer) for mitigating inequality caused by too little data for certain groups.

We will work closely with patients and members of the public to help focus and interpret our research, and to help publicise our findings. We will collaborate with other research teams to share learnings and methods, and with the NHS and government to ensure this research turns into practical improvements in health equity.

Technical Summary

Artificial intelligence (AI) holds great potential to solve complex problems and support decision making and is expected to improve clinical outcomes in the near future. However, a critical and alarming caveat is that AI in medicine, particularly those using data-driven technologies, are subject to, or themselves cause, bias and discrimination, exacerbating existing health inequity, such as those among racial and ethnicity groups. Health inequality goes much broader beyond race and ethnicity as particularly widely reported on age, gender and socioeconomics.

To study AI induced bias, current AI audit approaches mainly assume equal accuracy leads to health equity, which is often not true as the target variables are often biased in healthcare. We are in dire need of frameworks quantifying bias based on actual health needs. Even more absent are solutions for ensuring health equity and maintaining accuracy at the same time.

We propose four tests for assessing the effectiveness of a tool (or a framework) in mitigating AI-induced health inequalities.
T1[true fairness]. Can it detect and quantify AI and data bias based on objective health needs?
T2[easy dissemination]. Can it evaluate bias in a simple and conceptually similar way as those widely used performance metrics like ROC-AUC?
T3[debugging & guidance]. Can it assist AI model design by assessing risks of bias in selecting features and target variables?
T4[multiobjective]. Can it provide a mitigation approach minimising model induced inequality while maintaining the accuracy of AI models?

This project proposes a novel QMIA framework that aims for passing all four tests, provides it as a ready-to-use library, conducts a suite of analyses on exemplar datasets and diseases and implements novel mitigation solutions. We will interlink communities and engage stakeholders to form synergistic forces and seek real world impact via working with the SPIRIT-AI/CONSORT-AI, QUADAS-AI/PROBAST-AI, and MHRA and NIC

Publications

10 25 50