Improving Statistical Machine Learning approaches for Time-to-Event Prediction Modelling

Lead Research Organisation: University of Oxford

Abstract

Background

The implications of more than one chronic health condition occurring in the same individual on long-term health outcomes is often unclear. Existing studies typically focus on the association of a single condition with a single outcome and often ignore or select out individuals who might have a background of multiple long-term conditions (MLTC). Consequently, the health needs of certain groups of individuals with MLTCs may not be adequately addressed. One obstacle to studying the impact of MLTC is the number of possible combinations of health conditions. It may not be feasible to identify and recruit sufficiently large numbers of individuals with certain sets of conditions to study.

To overcome these challenges, a recent approach has been to retrospectively use information held in medical record databases. Groups of individuals with similar characteristics can be identified from the database and compared with other groups to determine why certain health outcomes manifest. Computer models, based on statistical machine learning algorithms, can then be created to predict the future risk of these health outcome given individual characteristics. However, the use of this historical datasets to create prediction models needs careful handling. The data may have been acquired under a different context and/or premise to the situation in which you may be interested, and this could lead to computer models that give biased or misleading insights. In addition, complex machine learning models can lack robustness and be prone to unstable behaviour, for example, giving very different risk probabilities for two nearly identical patients.

Aims & Objectives

This research aims to develop methodologies that will improve the robustness and validity of statistical machine learning-based prediction models that are constructed from observational data:

1) To assess the robustness and stability of existing statistical machine learning approaches for time-to-event modelling,

2) To develop methodology to improve aspects of the robustness and stability of statistical machine learning approaches for time-to-event modelling,

3) To test the novel methodology using real-world primary care data and compare it to existing approaches.

Novelty of the research methodology

Standard machine learning development focuses on the use of accuracy-related criteria to measure how well prediction models perform. However, there is increasing awareness that in real-world usage, accuracy is just one of several important criteria that determines the usefulness of a prediction model. In this research we will study the use of model training criteria that encompass considerations of the (i) four levels of model stability (as defined in Riley & Collins (2023), (ii) consistency between model versions after updating, and (iii) sensitivity to unusual data inputs.

Alignment to EPSRC's strategies and research areas

This project falls within the 'EPSRC Healthcare Technologies research area' where "Optimising disease prediction, diagnosis and intervention" is one of the themes or research areas listed on this website https://www.epsrc.ac.uk/research/ourportfolio/themes/

It will create new methods for analysing large real-world primary care health data sets, underpin patient-specific predictive models, and support the identification of opportunities for prevention of disease or its recurrence.

Collaborations

This project will involve a collaboration with the University of Birmingham.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 01/04/2019 30/09/2027
2722161 Studentship EP/S02428X/1 01/10/2022 30/09/2026 Sara Matijevic