Statistical Machine Learning for Emergency Hospital Admissions

Lead Research Organisation: Durham University
Department Name: Mathematical Sciences

Abstract

This PhD work will join an ongoing collaboration with NHS Scotland seeking to predict risk of emergency hospital admission to assist healthcare professionals in prioritising patients with complex care needs, with the ultimate aim of reducing emergency hospital attendance. This facilitates both better patient outcomes and potential cost savings for the health service. There is a population level dataset, providing longitudinal diagnostic and patient level data for the full Scottish population that has used an NHS hospital (~3.6 million individuals). Prior work to date has only involved a collection of logistic regression models which were based on expertly elicited cohorts of patients in the population. These models were combined using a simple unlearned rule. Therefore, this PhD will seek to develop new statistical methodology inspired by a clear focus on addressing the above applied question.

Early work will examine the effect of existing unsupervised learning (or clustering) methods to learn patient cohorts from the data without expert intervention. This will quickly move on and extend to a fully principled Bayesian analysis where new methodology will need to be developed to enable joint inference of both patient cohorts and logistic regression model parameters, falling in the domain of so-called mixture of expert modelling. Achieving this goal will likely require extension of existing computational statistics methods: although methods currently exist for model based clustering and logistic regression separately, less work has been done in the literature on full inference of a joint model incorporating both clustering and risk score prediction for models at the scale required here (in excess of 27 million patient-response observations). This will include exploring extension of many recent so-called "big data" Monte Carlo methods, including approaches that achieve convergence to the correct Bayesian posterior whilst only requiring access to subsamples of the data on any given iteration of the algorithm.

This initial milestone will enable both improved risk score prediction and insight into which population cohorts experience homogeneous covariate effects for emergency admission. It is of particular interest to the NHS to compare these to the previously expertly elicited clusters. In this sense both clustering and risk scoring aspects of the first stage of work are still fully interpretable, potentially providing not only accurate risk scoring but also diagnostic assistance to GPs.

A second question is how much better risk scoring can be made if full interpretability is sacrificed (so-called black box modelling). The work will move on to extend the joint inference task to more modern statistical machine learning methods such as gradient boosting machines, whilst retaining the goal of estimating patient cohorts, thereby retaining some modicum of interpretability whilst leveraging the latest advances in machine learning. The flexibility of these modern methods means that careful development of a model penalisation scheme will be required to ensure patient cohort discovery -- that is, for example penalising the number of rounds in gradient boosting. This will mean that within each cohort accuracy can be improved whilst retaining cohort estimation, rather than simply turning the entire modelling exercise into a black box. In particular, there will be computational statistics challenges to overcome in the scale of the model, since fully Bayesian approaches to more complex machine learning models is still an area of very active research.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513039/1 01/10/2018 30/09/2023
2181964 Studentship EP/R513039/1 01/10/2019 31/03/2023 Samuel Emerson