Causal Generative Modelling for Identifying and Alleviating Biases of Neural Networks in Medical Imaging

Lead Research Organisation: University of Oxford

Abstract

Medical image analysis is undergoing a transformation with the emergence of Deep Neural Networks (DNNs), a type of machine learning models that are increasingly being trialled into clinical workflows as tools, for example to assist pathology detection. The diversity of medical images, however, resulting from the variety of patient populations, acquisition scanners and protocols, poses a challenge for ensuring reliable performance after model deployment. Because medical imaging databases are of limited size, they commonly do not cover this heterogeneity. As a result, training data can include biases, for example images acquired using a specific type of scanner or primarily from patients of specific age. These biases can be inherited by a model trained on this data, which may underperform if deployed to process data with different characteristics. This can have serious consequences in healthcare. The current generation of DNNs lacks dependable mechanisms for detecting and alerting users when predictions may be unreliable.

Aim of this project is to create safer and more dependable machine learning for medical imaging. Our approach is to develop Causal Generative Models (CGMs) capable of explicitly capturing relationships between factors that either influence or should not influence model predictions. By doing so, we can pinpoint the specific data characteristics responsible for biases and sub-optimal model predictions. This knowledge can then be used to alert a user of possible sub-optimal model operation, or use this knowledge to alleviate the influence of such data characteristics, to obtain fairer and more reliable models.
To this end, the project has the following specific objectives:

a) The first objective is to develop CGMs for modelling the characteristics of a database and identifying any biases therein. Medical image datasets typically contain correlations between variables such as age, sex, disease severity and used imaging equipment among others, with certain combinations occurring more often than others. We will develop novel causality-based methods for modelling the relations between these variables, how they affect image appearance, and identify if such relations are desirable or artifacts of spurious correlations in the data. Causal inference will then enable synthesis of artificial images, so called counter-factuals, for characteristics that are under-represented or missing from the original database. Although common in other areas of health research such as genetics and epidemiology, causal inference has seen far less use in the context of imaging.

b) The second objective is to develop methods for identifying biases inherited by a model that has already been trained for a task of interest (e.g. detection of a pathology), potentially due to a biased training database. This is critical in order to determine how well it will perform upon deployment, for example to ensure fair and reliable operation under different patient demographics or image quality.

c) The final objective is to develop techniques for alleviating any biases identified in a pre-trained model. This complements the previous objectives and combines their output in a single framework: Having identified biases of a model (b), we will use a Causal Generative Model of the data (a) to generate synthetic but realistic data of the characteristics for which the model tends to underperform. This synthetic data will be used for adapting the model, enhancing its performance for more reliable operation.

This project falls within the EPSRC Medical Imaging research area and the Healthcare Technologies theme.
The proposed research will generate novel techniques based on causal inference for the mitigation of bias in medical image analysis, which will lead to fair and more trustworthy models, necessary for adoption of such tools in real-world clinical settings.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 01/04/2019 30/09/2027
2721977 Studentship EP/S02428X/1 01/10/2022 30/09/2026 Yasin Ibrahim