Use of Routinely Collected Health Data to Predict Sudden Death and Other Catastrophic Events

Lead Research Organisation: University of Glasgow
Department Name: College of Medical, Veterinary, Life Sci

Abstract

Keywords: Sudden Death, Routinely Collected Health Data, Artificial Intelligence, Machine Learning, Precision Medicine

Background: The first presentation of about one third of cardiovascular disease is sudden death;about 50,000 cases per year in England or about 2,500 per year in the West of Scotland. Many others will present first with a myocardial infarction, stroke or heart failure and then die suddenly. The cause of sudden death is varied but many are likely to be due either to ischaemia or arrhythmias; aka sudden cardiac death (SCD). Presentation as a myocardial infarction, stroke or decompensated heart failure could be considered as a failed 'attempt' at SCD. Predicting such catastrophic events may help prevent them, either by applying existing medical guidelines more assiduously to those at risk or by helping design relevant clinical trials. However, the incidence of sudden death is poorly described, since it lacks a clear and unequivocal definition. Death certification provides one source of data but many sudden deaths are attributed to a specific disease based on medical opinion rather than evidence. Using routinely-collected data to create a new definition of sudden death by studying the evolution of the health record may provide very different perspectives on how to define and incidence rates. People who die at home or within hours of reaching hospital who had no prior severe medical history indicating imminent demise and where trauma or suicide is not the cause of death can be considered to have died suddenly. The West of Scotland SafeHaven provides access to routinely collected NHS data over the last 10 years (potentially ~20,000 sudden deaths) including electrocardiographic, imaging, laboratory and prescribing data from both primary and secondary care. This provides a rich source of data that can be interrogated to identify new patterns that predict the risk of sudden death and other catastrophic events. The size of this dataset makes it amenable to analysis with modern machine learning techniques. In particular, probabilistic clustering models that incorporate different data types can be used to subdivide instances of sudden death to help expose potential relationships. Search and natural language processing techniques on free text records can be used to extract structured data and to find similar patients (for use in modelling features). Supervised machine learning models, including deep neural networks, can be used to automatically learn features and build predictive models. Combinations of variables with high predictive ability can be extracted from these models to discover previously unknown relationships between factors. The diversity of the data across modalities (sensors, images, labs, codes, and text) allows for rich and complementary data models to be constructed. One particular modality may be subject to bias or noise (sensor or coding errors) that could be automatically corrected and de-noised based on large-scale data analysis across the population. For example, extraction of text from notes could be used to infer missing or incorrect codes or properties, such as prescription history or lifestyle factors.

Aims: To describe the incidence of sudden death and other catastrophic events and, in a case-controlled study, to predict their occurrence to enable individual-patient targeted strategies for their prevention using AI, machine learning and information retrieval techniques.

Training outcomes: Yola will receive basic training in clinical terminology, diagnosis, natural history & treatment of relevant cardiovascular diseases to help put the data and strategy into context. She will also be trained in data-protection legislation, data-linkage, analysis, & cleaning and receive hands-on training in modern machine learning techniques (eg. deep networks, text modelling, etc). As such, this PhD addresses two of the skills priorities highlighted by the MRC (quantitative & interdisciplinary skills).

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
MR/N013166/1 01/10/2016 30/09/2025
2285827 Studentship MR/N013166/1 01/10/2019 31/03/2023 Yola Jones