Multimodal Active Adaptive Risk Stratification For Cancer

Lead Research Organisation: University of Oxford

Abstract

According to Cancer Research UK, almost half (45.5%) of all cancer cases in England were diagnosed at stage 3 & 4 in 2018. More than half of new cancer cases were breast, prostate, lung, or bowel cancer in 2016 - 2018. Symptoms of bowel cancer are not specific, making it harder for doctors to identify patients at risk or with early-stage cancer. The UK also has one of the poorest survival rates for colorectal cancer in Europe, thought to be partly due to late presentation, and delays in diagnosis and treatment. Currently there are no adaptive interpretable decision tools to enable earlier cancer diagnosis. These tools could potentially improve the 5-year survival rate and treatment options available to patients. Furthermore, there are no systems that update and refine cancer risk over time as new information becomes available. Whilst there exist risk scoring systems implemented in the NHS such as the First-of-Type QCancer risk stratification system, these provide only static risk estimates.

The goal of this DPhil proposal is to create a dynamic risk score and a recommendation system of diagnostic tests for early cancer detection. There are three themes: (i) Creation of cancer risk prediction for multimodal and multivariate timeseries data in primary care; (ii) Identification of patient subgroups for cancer phenotypes; (iii) Building a recommendation system of diagnostic tests embedded in a clinical workflow to provide decision support for early cancer detection.

Building on an in-house phenotyping algorithm, CAMELOT, we will focus on the modelling of static and dynamic variables in the multimodal and multivariate settings. We will then create CAMELOT++ by incorporating patient survival analysis to guide cluster formation and improve risk prediction in a novel multi-label setting. Further work will include the development of interpretable phenotypes that will allow us to identify salient biomarkers that are associated with different types of cancer. We will also follow how these change over time and derive the likelihood of each clinical phenotype with the different cancer types.

Initial work would focus on the development of the risk score for bowel cancer. These methods will be tested on the QResearch database of GP electronic health records linked to secondary care outcomes from GP records across England (REC reference 03/4/021; 18/EM/0400) and cancer registry data. The database contains information on variables such as demographics, diagnoses, treatments, and outcomes. Further exploration will be focused on the creation of a personal digital twin for test recommendation and finally be expanded by including other cancer types, such as breast and lung cancer.

The phenotyping algorithm would enable us to identify cancer patient subgroups for the first time in primary care settings. The derived cluster-based patient characteristics as well as tests or diagnoses considered in each clinical phenotype would help inform the clinicians of the necessary tests to perform to assist the decision-making process. Based on these, we aim to develop cluster-based patient-specific interpretability maps which will help us understand how certain biomarkers and diagnostic tests contribute to the outcome phenotypes. These maps would further support rapid diagnostic decisions by providing clinicians with a transparent framework.

This project falls within the EPSRC healthcare technologies theme, and the EPSRC artificial intelligence and robots and EPSRC digital twin area. We hope to address challenge 2 of transforming early prediction and diagnosis through developing a dynamic risk score for cancer in a primary care setting. Our decision-support system would help derive patient phenotypes and identify cluster-based patient characteristics for different types of cancer. Lastly, by using a digital twin or personalised recommendation system of diagnostic tests we hope to provide early warnings to the patients and their clinicians.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 01/04/2019 30/09/2027
2722269 Studentship EP/S02428X/1 01/10/2022 30/09/2026 Katarina Vukosavljevic