QMIA: Quantifying and Mitigating Bias affecting and induced by AI in Medicine
Lead Research Organisation:
UNIVERSITY COLLEGE LONDON
Department Name: Institute of Health Informatics
Abstract
Artificial Intelligence (AI) has demonstrated exciting potential in improving healthcare. However, these technologies come with a big caveat. They do not work effectively for minority groups. A recent study published in Science shows a widely used AI tool in the US concludes Black patients are healthier than equally sick Whites. Using this tool, a health system would favour White people when allocating resources, such as hospital beds. AI models like this would do more harm than good for health equity. Such inequality goes way beyond racial groups, affecting people with different gender, age and socioeconomics background. Such AI induced bias might come from healthcare data, which significantly lacks data on minorities and embeds decades of health care disparities among different groups of people. The COVID-19 pandemic highlighted this issue, with UK minority groups disproportionately affected by higher infection rates and worse outcomes. Bias may also arise in the design and development of AI tools, where inequalities can be built into the decisions they make, including how to characterise patients and what to predict. For example, the above-mentioned AI tool in the US uses health costs as a proxy for health needs, making its predictions reflect economic inequality as much as care requirements, further perpetuating racial disparities.
However, currently, AI models in medicine are still only measured by accuracy, leaving their impact on inequalities untested. Current AI audit tools are not fit for purpose as they do not detect and quantity bias based on actual health needs. Largely absent are effective tools devised particularly for healthcare for evaluating and mitigating AI induced inequalities. This project aims to develop a set of tools for optimising health datasets and supporting AI development in ensuring equity. Central to the solution is a novel measurement tool for quantifying health inequalities: deterioration-allocation area under curve. This framework assess the fairness by checking whether the AI allocate the same level of resources for people with the same health needs across different groups. We will use three representative health datasets: (1) CVD-COVID-UK, containing person-level data of 57 million people in England; (2) SCI-Diabetes, a diabetes research cohort containing everyone with diabetes in Scotland; (3) UCLH dataset, routine secondary care data from University College London Hospitals NHS Foundation Trust. COVID-19 and Type 2 diabetes will be used as exemplar diseases for investigations. Specifically, this project will conduct three lines of work:
1. Analyse the embedded racial bias in all three heath datasets so AI developers can make informed decisions and selections on how to characterise patients and what to predict;
2. Systematically review and analyse risk prediction models, particularly those widely used in clinical settings, for COVID-19 and type 2 diabetes;
3. Develop a novel method called multi-objective ensemble to bring insights from complementary datasets (avoiding actual data transfer) for mitigating inequality caused by too little data for certain groups.
We will work closely with patients and members of the public to help focus and interpret our research, and to help publicise our findings. We will collaborate with other research teams to share learnings and methods, and with the NHS and government to ensure this research turns into practical improvements in health equity.
However, currently, AI models in medicine are still only measured by accuracy, leaving their impact on inequalities untested. Current AI audit tools are not fit for purpose as they do not detect and quantity bias based on actual health needs. Largely absent are effective tools devised particularly for healthcare for evaluating and mitigating AI induced inequalities. This project aims to develop a set of tools for optimising health datasets and supporting AI development in ensuring equity. Central to the solution is a novel measurement tool for quantifying health inequalities: deterioration-allocation area under curve. This framework assess the fairness by checking whether the AI allocate the same level of resources for people with the same health needs across different groups. We will use three representative health datasets: (1) CVD-COVID-UK, containing person-level data of 57 million people in England; (2) SCI-Diabetes, a diabetes research cohort containing everyone with diabetes in Scotland; (3) UCLH dataset, routine secondary care data from University College London Hospitals NHS Foundation Trust. COVID-19 and Type 2 diabetes will be used as exemplar diseases for investigations. Specifically, this project will conduct three lines of work:
1. Analyse the embedded racial bias in all three heath datasets so AI developers can make informed decisions and selections on how to characterise patients and what to predict;
2. Systematically review and analyse risk prediction models, particularly those widely used in clinical settings, for COVID-19 and type 2 diabetes;
3. Develop a novel method called multi-objective ensemble to bring insights from complementary datasets (avoiding actual data transfer) for mitigating inequality caused by too little data for certain groups.
We will work closely with patients and members of the public to help focus and interpret our research, and to help publicise our findings. We will collaborate with other research teams to share learnings and methods, and with the NHS and government to ensure this research turns into practical improvements in health equity.
Technical Summary
Artificial intelligence (AI) holds great potential to solve complex problems and support decision making and is expected to improve clinical outcomes in the near future. However, a critical and alarming caveat is that AI in medicine, particularly those using data-driven technologies, are subject to, or themselves cause, bias and discrimination, exacerbating existing health inequity, such as those among racial and ethnicity groups. Health inequality goes much broader beyond race and ethnicity as particularly widely reported on age, gender and socioeconomics.
To study AI induced bias, current AI audit approaches mainly assume equal accuracy leads to health equity, which is often not true as the target variables are often biased in healthcare. We are in dire need of frameworks quantifying bias based on actual health needs. Even more absent are solutions for ensuring health equity and maintaining accuracy at the same time.
We propose four tests for assessing the effectiveness of a tool (or a framework) in mitigating AI-induced health inequalities.
T1[true fairness]. Can it detect and quantify AI and data bias based on objective health needs?
T2[easy dissemination]. Can it evaluate bias in a simple and conceptually similar way as those widely used performance metrics like ROC-AUC?
T3[debugging & guidance]. Can it assist AI model design by assessing risks of bias in selecting features and target variables?
T4[multiobjective]. Can it provide a mitigation approach minimising model induced inequality while maintaining the accuracy of AI models?
This project proposes a novel QMIA framework that aims for passing all four tests, provides it as a ready-to-use library, conducts a suite of analyses on exemplar datasets and diseases and implements novel mitigation solutions. We will interlink communities and engage stakeholders to form synergistic forces and seek real world impact via working with the SPIRIT-AI/CONSORT-AI, QUADAS-AI/PROBAST-AI, and MHRA and NIC
To study AI induced bias, current AI audit approaches mainly assume equal accuracy leads to health equity, which is often not true as the target variables are often biased in healthcare. We are in dire need of frameworks quantifying bias based on actual health needs. Even more absent are solutions for ensuring health equity and maintaining accuracy at the same time.
We propose four tests for assessing the effectiveness of a tool (or a framework) in mitigating AI-induced health inequalities.
T1[true fairness]. Can it detect and quantify AI and data bias based on objective health needs?
T2[easy dissemination]. Can it evaluate bias in a simple and conceptually similar way as those widely used performance metrics like ROC-AUC?
T3[debugging & guidance]. Can it assist AI model design by assessing risks of bias in selecting features and target variables?
T4[multiobjective]. Can it provide a mitigation approach minimising model induced inequality while maintaining the accuracy of AI models?
This project proposes a novel QMIA framework that aims for passing all four tests, provides it as a ready-to-use library, conducts a suite of analyses on exemplar datasets and diseases and implements novel mitigation solutions. We will interlink communities and engage stakeholders to form synergistic forces and seek real world impact via working with the SPIRIT-AI/CONSORT-AI, QUADAS-AI/PROBAST-AI, and MHRA and NIC
Publications
Alsaleh MM
(2023)
Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques: A systematic review.
in International journal of medical informatics
Feng W
(2024)
Applying contrastive pre-training for depression and anxiety risk prediction in type 2 diabetes patients based on heterogeneous electronic health records: a primary healthcare case study.
in Journal of the American Medical Informatics Association : JAMIA
Francis F
(2023)
Machine Learning to Classify Cardiotocography for Fetal Hypoxia Detection.
in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Gao Y
(2024)
Optimising the paradigms of human AI collaborative clinical coding
in npj Digital Medicine
Greene C
(2024)
Antidepressant and antipsychotic prescribing in patients with type 2 diabetes in Scotland: A time-trend analysis from 2004 to 2021
in British Journal of Clinical Pharmacology
| Description | Building a database of the immunohistochemical profiles of tumours from histopathology reports at scale using large language models and machine learning |
| Amount | £59,907 (GBP) |
| Funding ID | PGS23 100040 |
| Organisation | Rosetrees Trust |
| Sector | Charity/Non Profit |
| Country | United Kingdom |
| Start | 09/2023 |
| End | 10/2025 |
| Description | Facilitating Better Urology Care With Effective And Fair Use Of Artificial Intelligence - A Partnership Between UCL And Shanghai Jiao Tong University School Of Medicine |
| Amount | £39,968 (GBP) |
| Organisation | British Council |
| Sector | Charity/Non Profit |
| Country | United Kingdom |
| Start | 03/2024 |
| End | 02/2026 |
| Title | The Deterioration-Allocation Index: A framework for health inequality evaluation |
| Description | This repository implements a DA-AUC (deterioration-allocation area under curve) metric for quantifying inequality between patient groups (a) embedded in datasets; or (b) induced by statistical / ML / AI models. This is analogous to ROC-AUC for assessing performance of prediction models. Methodology We define and quantify health inequalities in a generic resource allocation scenario using a novel deterioration-allocation framework. The basic idea is to define two indices: a deterioration index and an allocation index. The allocation index is to be derived from the model of interest. Conceptually, models used in real-world contexts can be abstracted and thought of as resource allocators, predicting for example the probability of Intensive Care Unit (ICU) admission. Note that the models do not need to be particularly designed to allocate resources, for example, risk prediction of cardiovascular disease (CVD) among people with diabetes is also a valid index for downstream resource allocation. Essentially, a resource allocator is a computational model that takes patient data as input and outputs a (normalised) score between 0 and 1. We call this score the allocation index. The deterioration index is a score between 0 and 1 to measure the deterioration status of patients. It can be derived from an objective measurement for disease prognosis (i.e., a marker of prognosis in epidemiology terminology), such as extensively used comorbidity scores or biomarker measurements like those for CVDs. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | AI technologies are being increasingly tested and applied in critical environments including healthcare. Without an effective way to detect and mitigate AI induced inequalities, AI might do more harm than good, potentially leading to the widening of underlying inequalities. This paper proposes a generic allocation-deterioration framework for detecting and quantifying AI induced inequality. Specifically, AI induced inequalities are quantified as the area between two allocation-deterioration curves. To assess the framework's performance, experiments were conducted on ten synthetic datasets (N>33,000) generated from HiRID - a real-world Intensive Care Unit (ICU) dataset, showing its ability to accurately detect and quantify inequality proportionally to controlled inequalities. Extensive analyses were carried out to quantify health inequalities (a) embedded in two real-world ICU datasets; (b) induced by AI models trained for two resource allocation scenarios. Results showed that compared to men, women had up to 33% poorer deterioration in markers of prognosis when admitted to HiRID ICUs. All four AI models assessed were shown to induce significant inequalities (2.45% to 43.2%) for non-White compared to White patients. The models exacerbated data embedded inequalities significantly in 3 out of 8 assessments, one of which was >9 times worse. |
| URL | https://github.com/knowlab/daindex |
| Description | A partnership between UCL and Shanghai Jiao Tong University School of Medicine |
| Organisation | Shanghai Jiao Tong University |
| Department | School of Medicine |
| Country | China |
| Sector | Academic/University |
| PI Contribution | UCL team is contributing to the following aspects. - Datasets: We will utilise our research access to the 58 million English population linked health datasets via the CVD-COVID-UK/COVID-IMPACT consortium, of which Dr Wu is a member. - Disease phenotype models for urology: Dr Wu's team is leading efforts of using the whole English population for deriving computational phenotypes for >300 conditions. Such phenotype models will serve as a transitional resource for facilitating urology diagnosis and prognostic predictions, especially for rare diseases. - AI models: UCL will provide a pre-trained AI model on a large health related corpus for research and teaching in this project. - Computational resources: The GPU resources in Dr Wu's group and UCL will be utilised for the project. - Teaching/training material and expertise: We will take the advantage of UCL's years of experiences in developing, updating, and delivering modules of machine learning in healthcare. |
| Collaborator Contribution | SJTU is contributing to the follwing: - Center size: Shanghai Sixth People's Hospital Affiliated to SJTU School of Medicine is a tertiary comprehensive teaching hospital, known as the "birthplace of ultrasound diagnosis in China". The Department of Urology is mainly characterized by urethral repair and reconstruction with a huge number of surgical patients every year, and is one of the largest urethral repair and reconstruction centers in the world, with Shanghai Eastern Institute of Urologic Reconstruction. - Database: We have a large database that includes patient clinical information, biological samples, urinary flow rate data, imaging data, and other data. - Medical school: The School of Medicine of Shanghai Jiao Tong University is one of the top medical schools in China, with the clinical medicine major ranking first in Chinese disciplines. It has a Clinical Medicine School at the Sixth People's Hospital of Shanghai, which undertakes the teaching work of numerous undergraduate, graduate, and intern doctors. While these students are learning clinical knowledge, it is also necessary to have an understanding of international and first-class health informatics knowledge. |
| Impact | Gao, Yue, Yuepeng Chen, Minghao Wang, Jinge Wu, Yunsoo Kim, Kaiyin Zhou, Miao Li, Xien Liu, Xiangling Fu, Ji Wu & Honghan Wu. "Optimising the paradigms of human AI collaborative clinical coding." npj Digital Medicine 7, no. 1 (2024): 368. DOI:10.1038/s41746-024-01363-7 Wu, Jinge, Hang Dong, Zexi Li, Haowei Wang, Runci Li, Arijit Patra, Chengliang Dai, Waqar Ali, Phil Scordis, and Honghan Wu. "A hybrid framework with large language models for rare disease phenotyping." BMC Medical Informatics and Decision Making 24, no. 1 (2024): 289. DOI:10.1186/s12911-024-02698-7 |
| Start Year | 2024 |
| Description | HDR UK Advanced Computer Science Summit: AI and Healthcare: April 24 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Talked about the use of natural language processing, especially large language models compared with small models, and Knowledge Graph techniques to analyse health data. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://hdrwales.org.uk/hdr-uk-advanced-computer-science-summit-ai-and-healthcare-april-24/ |
| Description | Health equity interest group at the Alan Turing Institute |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | The Health Equity Interest Group aims to form an inclusive multidisciplinary working force to ensure the applications of AI in medicine give everyone equal access to care resources and improve everyone's health. Specifically, we have the following objectives. Connect researchers with public health, health and care professionals to advance health equity by a) developing new methodologies and digital tools to better understand and address existing inequalities, and b) safely applying the latest innovations in data science and AI in healthcare settings. Provide a platform to share learnings, best practices and priorities, and equip health policy and practice leaders with the necessary technical skills to assess the potential opportunities and pitfalls of the use of DS and AI tools in health for equity Promote discussion between the various stakeholders (academics, public health, health and care professionals, social scientists, regulatory agencies (e.g. MHRA, NICE), health care commissioners, policymakers, funders etc.) to identify the main challenges, risks, and barriers in the equitable use of statistics, machine learning and AI both in biomedical research, in the clinic and at a population level, thus setting the agenda for future research into these areas. Engage with public groups to ensure public view on the development and application of DS and AI for health equity are considered by the community, and that the public experience of health equity/inequity also informs the methods developed and highlights potential pitfalls. |
| Year(s) Of Engagement Activity | 2023,2024,2025 |
| URL | https://www.turing.ac.uk/research/interest-groups/health-equity |
