A Novel Reduction-Based Approach to Machine Learning Survival Modelling

Lead Research Organisation: University College London
Department Name: Statistical Science

Abstract

The foundation of my research rests in time-series and time-to-event (Survival) analysis. The motivation lies in bringing state-of-the-art machine learning models to Survival analysis. Several papers have been published on this subject and models have been previously utilised to use machine learning in time-series. For example, the use of Gaussian Processes for survival analysis (van der Schaar et al., 2017), or Random Survival Forests (Ishwaran et al., 2008), just to name a couple. However there has yet to be a comprehensive framework that allows for rigorous model selection, validation and comparison in Survival analysis.
Continuing previous research my PhD will be building an architecture, both mathematically and using software including R and Python, to allow for a comprehensive machine learning approach to Survival Analysis. This will include discussing meta-strategies such as model tuning and ensemble-methods. I will also discuss and attempt to solve problems that arise with time-series data, such as class imbalance and online updating. The importance of online modelling is particularly relevant when we look at models that can take an extensive period of time for training. By utilising updating we will attempt to remove the re-training process and improve model efficiency.
My research will have two primary aims that can be roughly split into meta-analysis and model creation. In the first instance I will create a comprehensive study of commonly used Survival models, such as Cox Proportional Hazards, and assess these against more modern models that make use of machine learning. This will include studying the relationships between the metrics used to evaluate differing types of Survival Models. Additionally, I will be looking at commonly used techniques to solve problems that arise in time-series, such as censoring and imbalance. The second part of the research will build on the first part as well as my previous research into reduction of the machine learning Survival task. Here I will derive a comprehensive workflow for model selection, evaluation and comparison.
This research is relevant as there are many questions that remain unanswered. For example, whilst meta-strategies such as tuning are well-researched and understood in more classical supervised learning models, less research has been placed in tuning Survival models. Moreover, commonly used Survival models are often compared to each other to assess performance and various residual statistics can be computed but there is yet to be a well-defined indication of what a 'good' performance statistic for a Survival model would look like. For example, a classification model that makes random probabilistic predictions will achieve a Brier score around 0.25, so any model below this can be considered 'good', but what meaning can you give to a single Cox model with a Deviance of 40,000? This missing framework is vital to bringing Survival modelling in machine learning up to the same standards as the more classical supervised setting. As my initial focus will be on a reduction-based approach I will also be looking at open questions such as defining what reduction means in the context of Survival modelling and how these models can be mathematically related to supervised approaches (for example the connection between the Cox model and generalized linear models are already well understood).
Survival analysis is most important in the context of patient health-care data and predicting the future health-state of a patient (risk of illness, stroke, death, etc.). With this in mind I will evaluate all new and existing models against both real-world health-care data as well as synthesised data that can test the models in a wide variety of cases. This will likely add a layer of difficulty in the form of Big Data as cross-sectional healthcare data-sets, especially as those with multiple time-points, can quickly become very large; I hope to also tackle this in my research.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513143/1 01/10/2018 30/09/2023
2064211 Studentship EP/R513143/1 01/10/2018 31/01/2021 Raphael Sonabend