Deep Patient Trajectory Analysis: Learning from electronic health records

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Electronic health records (EHR) have recently matured into an enormous source of routinely collected information. Today, a single EHR can contain comprehensive medical histories for millions of patients, logging and collating data collected from all levels of healthcare over the course of decades from multiple healthcare institutions. Such a dataset can contain terabytes of information, with billions of entries recorded in a complex underlying data structure. In the past, classical survival analysis techniques have been used extensively to make predictions about the future of patients based on relatively small quantities of information. In my work, I am investigating how we can leverage much more of the information available in electronic health records, by adapting survival analysis techniques to a deep learning framework, to give healthcare providers more accurate survival distribution predictions and therefore improve their ability to make effective healthcare decisions.

The primary aim of this research is to produce deep learning models that are capable of processing a patient's entire medical history and returning a precise distribution for the time to an event of interest. This will require several independent components. Firstly, we will develop a robust method of pre-processing multi-modal, multi-source longitudinal EHR data. Secondly, we will need a suitable deep learning architecture that can identify the potentially complicated patterns in this data to create an accurate model of a patient and their susceptibility to the event of interest. Finally, we want our models to output a suitable survival distribution for the time to the event of interest. While there has already been some research into applying deep learning to EHR data, this methodology is novel in that it goes further than previous work, which has only looked into the probability of events occurring between discrete timepoints. I also plan to extend the work further by adapting more advanced survival analysis techniques to a deep learning framework, by exploring competing risks and multi-state models as well as scoring for distribution family selection.

This research is being done in collaboration with the NIHR ARC Northwest London research group and methods developed will be applied to data from the Northwest London Whole System Integrated Care (WSIC) database, with the aim of providing local healthcare providers with improved tools to understand and improve their care. The first case studies we plan to test our methods on include investigating time to events and multi-state modelling for patients suffering from Diabetes, heart-conditions as well as those with multi-morbidities.

This project falls within the EPSRC healthcare technologies research area.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2446166 Studentship EP/S023151/1 03/10/2020 30/09/2024 Thomas Matcham