📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

QMIA: Quantifying and Mitigating Bias affecting and induced by AI in Medicine

Lead Research Organisation: UNIVERSITY COLLEGE LONDON
Department Name: Institute of Health Informatics

Abstract

Artificial Intelligence (AI) has demonstrated exciting potential in improving healthcare. However, these technologies come with a big caveat. They do not work effectively for minority groups. A recent study published in Science shows a widely used AI tool in the US concludes Black patients are healthier than equally sick Whites. Using this tool, a health system would favour White people when allocating resources, such as hospital beds. AI models like this would do more harm than good for health equity. Such inequality goes way beyond racial groups, affecting people with different gender, age and socioeconomics background. Such AI induced bias might come from healthcare data, which significantly lacks data on minorities and embeds decades of health care disparities among different groups of people. The COVID-19 pandemic highlighted this issue, with UK minority groups disproportionately affected by higher infection rates and worse outcomes. Bias may also arise in the design and development of AI tools, where inequalities can be built into the decisions they make, including how to characterise patients and what to predict. For example, the above-mentioned AI tool in the US uses health costs as a proxy for health needs, making its predictions reflect economic inequality as much as care requirements, further perpetuating racial disparities.

However, currently, AI models in medicine are still only measured by accuracy, leaving their impact on inequalities untested. Current AI audit tools are not fit for purpose as they do not detect and quantity bias based on actual health needs. Largely absent are effective tools devised particularly for healthcare for evaluating and mitigating AI induced inequalities. This project aims to develop a set of tools for optimising health datasets and supporting AI development in ensuring equity. Central to the solution is a novel measurement tool for quantifying health inequalities: deterioration-allocation area under curve. This framework assess the fairness by checking whether the AI allocate the same level of resources for people with the same health needs across different groups. We will use three representative health datasets: (1) CVD-COVID-UK, containing person-level data of 57 million people in England; (2) SCI-Diabetes, a diabetes research cohort containing everyone with diabetes in Scotland; (3) UCLH dataset, routine secondary care data from University College London Hospitals NHS Foundation Trust. COVID-19 and Type 2 diabetes will be used as exemplar diseases for investigations. Specifically, this project will conduct three lines of work:

1. Analyse the embedded racial bias in all three heath datasets so AI developers can make informed decisions and selections on how to characterise patients and what to predict;
2. Systematically review and analyse risk prediction models, particularly those widely used in clinical settings, for COVID-19 and type 2 diabetes;
3. Develop a novel method called multi-objective ensemble to bring insights from complementary datasets (avoiding actual data transfer) for mitigating inequality caused by too little data for certain groups.

We will work closely with patients and members of the public to help focus and interpret our research, and to help publicise our findings. We will collaborate with other research teams to share learnings and methods, and with the NHS and government to ensure this research turns into practical improvements in health equity.

Technical Summary

Artificial intelligence (AI) holds great potential to solve complex problems and support decision making and is expected to improve clinical outcomes in the near future. However, a critical and alarming caveat is that AI in medicine, particularly those using data-driven technologies, are subject to, or themselves cause, bias and discrimination, exacerbating existing health inequity, such as those among racial and ethnicity groups. Health inequality goes much broader beyond race and ethnicity as particularly widely reported on age, gender and socioeconomics.

To study AI induced bias, current AI audit approaches mainly assume equal accuracy leads to health equity, which is often not true as the target variables are often biased in healthcare. We are in dire need of frameworks quantifying bias based on actual health needs. Even more absent are solutions for ensuring health equity and maintaining accuracy at the same time.

We propose four tests for assessing the effectiveness of a tool (or a framework) in mitigating AI-induced health inequalities.
T1[true fairness]. Can it detect and quantify AI and data bias based on objective health needs?
T2[easy dissemination]. Can it evaluate bias in a simple and conceptually similar way as those widely used performance metrics like ROC-AUC?
T3[debugging & guidance]. Can it assist AI model design by assessing risks of bias in selecting features and target variables?
T4[multiobjective]. Can it provide a mitigation approach minimising model induced inequality while maintaining the accuracy of AI models?

This project proposes a novel QMIA framework that aims for passing all four tests, provides it as a ready-to-use library, conducts a suite of analyses on exemplar datasets and diseases and implements novel mitigation solutions. We will interlink communities and engage stakeholders to form synergistic forces and seek real world impact via working with the SPIRIT-AI/CONSORT-AI, QUADAS-AI/PROBAST-AI, and MHRA and NIC

Publications

10 25 50

publication icon
Francis F (2023) Machine Learning to Classify Cardiotocography for Fetal Hypoxia Detection. in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

 
Description Building a database of the immunohistochemical profiles of tumours from histopathology reports at scale using large language models and machine learning
Amount £59,907 (GBP)
Funding ID PGS23 100040 
Organisation Rosetrees Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2023 
End 10/2025
 
Description Facilitating Better Urology Care With Effective And Fair Use Of Artificial Intelligence - A Partnership Between UCL And Shanghai Jiao Tong University School Of Medicine
Amount £39,968 (GBP)
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2024 
End 02/2026
 
Title The Deterioration-Allocation Index: A framework for health inequality evaluation 
Description This repository implements a DA-AUC (deterioration-allocation area under curve) metric for quantifying inequality between patient groups (a) embedded in datasets; or (b) induced by statistical / ML / AI models. This is analogous to ROC-AUC for assessing performance of prediction models. Methodology We define and quantify health inequalities in a generic resource allocation scenario using a novel deterioration-allocation framework. The basic idea is to define two indices: a deterioration index and an allocation index. The allocation index is to be derived from the model of interest. Conceptually, models used in real-world contexts can be abstracted and thought of as resource allocators, predicting for example the probability of Intensive Care Unit (ICU) admission. Note that the models do not need to be particularly designed to allocate resources, for example, risk prediction of cardiovascular disease (CVD) among people with diabetes is also a valid index for downstream resource allocation. Essentially, a resource allocator is a computational model that takes patient data as input and outputs a (normalised) score between 0 and 1. We call this score the allocation index. The deterioration index is a score between 0 and 1 to measure the deterioration status of patients. It can be derived from an objective measurement for disease prognosis (i.e., a marker of prognosis in epidemiology terminology), such as extensively used comorbidity scores or biomarker measurements like those for CVDs. 
Type Of Material Improvements to research infrastructure 
Year Produced 2024 
Provided To Others? Yes  
Impact AI technologies are being increasingly tested and applied in critical environments including healthcare. Without an effective way to detect and mitigate AI induced inequalities, AI might do more harm than good, potentially leading to the widening of underlying inequalities. This paper proposes a generic allocation-deterioration framework for detecting and quantifying AI induced inequality. Specifically, AI induced inequalities are quantified as the area between two allocation-deterioration curves. To assess the framework's performance, experiments were conducted on ten synthetic datasets (N>33,000) generated from HiRID - a real-world Intensive Care Unit (ICU) dataset, showing its ability to accurately detect and quantify inequality proportionally to controlled inequalities. Extensive analyses were carried out to quantify health inequalities (a) embedded in two real-world ICU datasets; (b) induced by AI models trained for two resource allocation scenarios. Results showed that compared to men, women had up to 33% poorer deterioration in markers of prognosis when admitted to HiRID ICUs. All four AI models assessed were shown to induce significant inequalities (2.45% to 43.2%) for non-White compared to White patients. The models exacerbated data embedded inequalities significantly in 3 out of 8 assessments, one of which was >9 times worse. 
URL https://github.com/knowlab/daindex
 
Description A partnership between UCL and Shanghai Jiao Tong University School of Medicine 
Organisation Shanghai Jiao Tong University
Department School of Medicine
Country China 
Sector Academic/University 
PI Contribution UCL team is contributing to the following aspects. - Datasets: We will utilise our research access to the 58 million English population linked health datasets via the CVD-COVID-UK/COVID-IMPACT consortium, of which Dr Wu is a member. - Disease phenotype models for urology: Dr Wu's team is leading efforts of using the whole English population for deriving computational phenotypes for >300 conditions. Such phenotype models will serve as a transitional resource for facilitating urology diagnosis and prognostic predictions, especially for rare diseases. - AI models: UCL will provide a pre-trained AI model on a large health related corpus for research and teaching in this project. - Computational resources: The GPU resources in Dr Wu's group and UCL will be utilised for the project. - Teaching/training material and expertise: We will take the advantage of UCL's years of experiences in developing, updating, and delivering modules of machine learning in healthcare.
Collaborator Contribution SJTU is contributing to the follwing: - Center size: Shanghai Sixth People's Hospital Affiliated to SJTU School of Medicine is a tertiary comprehensive teaching hospital, known as the "birthplace of ultrasound diagnosis in China". The Department of Urology is mainly characterized by urethral repair and reconstruction with a huge number of surgical patients every year, and is one of the largest urethral repair and reconstruction centers in the world, with Shanghai Eastern Institute of Urologic Reconstruction. - Database: We have a large database that includes patient clinical information, biological samples, urinary flow rate data, imaging data, and other data. - Medical school: The School of Medicine of Shanghai Jiao Tong University is one of the top medical schools in China, with the clinical medicine major ranking first in Chinese disciplines. It has a Clinical Medicine School at the Sixth People's Hospital of Shanghai, which undertakes the teaching work of numerous undergraduate, graduate, and intern doctors. While these students are learning clinical knowledge, it is also necessary to have an understanding of international and first-class health informatics knowledge.
Impact Gao, Yue, Yuepeng Chen, Minghao Wang, Jinge Wu, Yunsoo Kim, Kaiyin Zhou, Miao Li, Xien Liu, Xiangling Fu, Ji Wu & Honghan Wu. "Optimising the paradigms of human AI collaborative clinical coding." npj Digital Medicine 7, no. 1 (2024): 368. DOI:10.1038/s41746-024-01363-7 Wu, Jinge, Hang Dong, Zexi Li, Haowei Wang, Runci Li, Arijit Patra, Chengliang Dai, Waqar Ali, Phil Scordis, and Honghan Wu. "A hybrid framework with large language models for rare disease phenotyping." BMC Medical Informatics and Decision Making 24, no. 1 (2024): 289. DOI:10.1186/s12911-024-02698-7
Start Year 2024
 
Description HDR UK Advanced Computer Science Summit: AI and Healthcare: April 24 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Talked about the use of natural language processing, especially large language models compared with small models, and Knowledge Graph techniques to analyse health data.
Year(s) Of Engagement Activity 2024
URL https://hdrwales.org.uk/hdr-uk-advanced-computer-science-summit-ai-and-healthcare-april-24/
 
Description Health equity interest group at the Alan Turing Institute 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Health Equity Interest Group aims to form an inclusive multidisciplinary working force to ensure the applications of AI in medicine give everyone equal access to care resources and improve everyone's health. Specifically, we have the following objectives.

Connect researchers with public health, health and care professionals to advance health equity by a) developing new methodologies and digital tools to better understand and address existing inequalities, and b) safely applying the latest innovations in data science and AI in healthcare settings.
Provide a platform to share learnings, best practices and priorities, and equip health policy and practice leaders with the necessary technical skills to assess the potential opportunities and pitfalls of the use of DS and AI tools in health for equity
Promote discussion between the various stakeholders (academics, public health, health and care professionals, social scientists, regulatory agencies (e.g. MHRA, NICE), health care commissioners, policymakers, funders etc.) to identify the main challenges, risks, and barriers in the equitable use of statistics, machine learning and AI both in biomedical research, in the clinic and at a population level, thus setting the agenda for future research into these areas.
Engage with public groups to ensure public view on the development and application of DS and AI for health equity are considered by the community, and that the public experience of health equity/inequity also informs the methods developed and highlights potential pitfalls.
Year(s) Of Engagement Activity 2023,2024,2025
URL https://www.turing.ac.uk/research/interest-groups/health-equity