Building robust methods for model explainability for healthcare

Lead Research Organisation: University of Oxford

Abstract

With the recent rise in medical applications of artificial intelligence (AI), decision makers are now demanding more transparency from predictive models. Explainability (XAI) is a complex challenge that stems from various technical fields. Its main goal is to build explanation model that make model decisions readily interpretable. We distinguish two types of XAI approaches: (i) local model approximations where a simple model, g(x), is fitted to predict the black box
model, f(x), in a neighborhood around the prediction point x, and (ii) additive feature attribution methods where a model-free estimator describes how the model outcome at a point x changes when some of its features are removed and sampled from a reference distribution. To this day, there is no consensus in the field of XAI when it comes to finding the right method for a use case. Further, methods are not reliable due to their instability and lack of robustness. Ultimately, a limited number of XAI methods are based on a causal reasoning, even though clinicians often seek causal explanations. Our aim is to find methodological improvements to bridge the gap between statisticians and clinicians who want to routinely use transparent, fair AI tools for prediction. We make use of statistical learning theory, robust statistics and knowledge of healthcare applications to build new, innovative approaches. The focus of our first research endeavor was to tackle the issue of locality in local explanation models and build a more robust approach to Shapley values that can resist adversarial attacks. In parallel, we have been developing methods for understanding poly-victimization in multi-outcome causal models. Working on both model explainability and causal inference will hopefully enable us to design causal XAI methods. Long term, we hope to bridge the gap between the two subfields.
Such technical advances in model explainability and causal inference can have high social impact, as they can improve the methodology for developing risk scores, and in particularly medical risk scores. Such measures are built according to feature attributions in predictive, non-causal models. Ultimately, we aim to study various alternatives to evaluating XAI methods. This task is challenging by definition, as there is no ground truth for evaluating these methods. Our aim is to borrow ideas and concepts from unsupervised learning and adapt them to the purpose of model explainability.
This project falls within two EPSRC research areas: "Artificial Intelligence and robotics" and "Healthcare technologies".

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2420816 Studentship EP/S023151/1 01/10/2020 30/09/2024 Lucile Ter-Minassian