Deriving an actionable patient phenome from healthcare data

Lead Research Organisation: University of Edinburgh
Department Name: Centre of Population Health Sciences

Abstract

Translating routinely collected health data into knowledge is a requirement of a "learning health system". Since joining the Biomedical Research Centre at the South London and Maudsley Hospital, Kings College London, my research has been focused on developing 'CogStack and SemEHR'. This is an integrated health informatics platform which aims to to unlock unstructured health records and assist in clinical decision making and research. The system does much to surface the deep data within the NHS, for example through providing a patient-centric search on semantically annotated clinical notes to support studies such as the recruitment of patients for Genomics England's 100,000 Genomes project [1,2] and predicting adverse drug reactions [3].

However, there is considerable further potential for the generation of knowledge and action, for example through the application of machine learning to the data from this platform. For instance, the data returned through these systems needs to be integrated, verified and cleaned with biomedical knowledge, enriched with an accurate clinical context (to enhance the current sentence-level language context) and aligned with the patient timeline to derive a comprehensive patient phenome. Clinical knowledge needs to be formalised from clinical ontologies and integrated with relevant open data, which will drive automated inferences to lift lower-level features (e.g. numeric blood pressure readings) up to higher-level clinical variables (e.g. hypertension) for supporting decision making.

A pilot study of the comprehensive phenome model, SemEHR's medical profiles [2], evaluated on publicly accessible data from the Medical Information Mart for Intensive Care (MIMIC), has proven that better contextual information can lead to much better accuracy in making clinical conclusions - e.g. using patient medical history for subtyping atrial fibrillation where we demonstrated that such phenome data is within the top 10 key features in identifying clinically-sensible patient clusters. For 'action' generation in clinical settings, we have demonstrated the feasibility of alerts through a number of simple examples using CogStack. For example, at Kings College Hospital, we have detected abnormal pathology results for 25 patients being prescribed methotrexate for rheumatoid arthritis, preventing potentially fatal renal failure.

The proposed research will devise a semantic electronic health record toolkit that is able to derive a consistent and comprehensive patient phenome from unstructured and structured electronic health records and provide semantic computation upon it to support decision making for tailored care, trial recruitment and research.

References:
1. Wu H, et al. SemEHR: surfacing semantic data from clinical notes in electronic health records for tailored care, trial recruitment, and clinical research. Lancet. 2017;390: S97.
2. Wu H, et al. A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research. Journal of the American Medical Informatics Association. 2017; doi: https://doi.org/10.1101/235622.
3. Bean DM, Wu H, et al. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep. 2017;7: 16416.

Technical Summary

For objective 1, at the data layer, my research will focus on a semantic phenome model that is able to detect/correct erroneous and inconsistent phenotypes, associate accurate contextual and temporal information with each phenotype mention and also support rule based reasoning to complete missing data. For objective 2, I will be devising and applying artificial intelligence models to derive unknown clinical knowledge from large scale, longitudinal and interlinked phenome data. potential use cases include predicting outcomes of septic shock treatments within intensive care units; predicting unknown adverse drug reactions in depression patients with comorbidities; subtyping atrial fibrillation to deliver tailored care. For objective 3, my research will provide actionable suggestions in clinical settings with applications of clinical trial recruitment and automated alerting for ensuring patient safety. Key challenges to be tackled here include how to make action suggestions explainable and reliable.

This project aims to deliver enabling technologies for The University of Edinburgh's HDR UK focus including deriving and applying health-related phenotypes at scale; computational tools for genetic and environmental risk prediction and causal inference. It will develop national leadership, partnerships, and interdisciplinary skills and capacity through the development of semantic computation infrastructure on top of deep and accurate patient phenome data, which if successful, can be disseminated to a wide range of healthcare service providers nationally/internationally and achieve high impact in research and patient care.

Publications

10 25 50
 
Description Use natural language processing for surfacing stroke phenotypes from Scottish radiology reports: a comparison of different methodologies 
Organisation University of Edinburgh
Department School of Informatics Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Investigate NLP model adaptation by reusing models trained on EHRs of London NHS trusts in Scottish radiology reports.
Collaborator Contribution Collaborators from Centre for Clinical Brain Sciences, University of Edinburgh provide ESS Stroke study data and Tayside radiology reports. They also manually labelled the data. Collaborators from Informatics Department provide computational resources for accessing data. They also provided their results on the same task by using rule based NLP and a neural network method.
Impact Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. Philip John Gorinski, Honghan Wu, Claire Grover, Richard Tobin, Conn Talbot, Heather Whalley, Cathie Sudlow, William Whiteley, Beatrice Alex. Accepted by HealTAC 2019. This is a multi-disciplinary study involves neurology and computing science.
Start Year 2018
 
Title nlp2phenome: using AI models to infer patient phenotypes from identified named entities (instances of biomedical concepts) 
Description Using natural language processing(NLP) to identify mentions of biomedical concepts from free text medical records is just the first step. There is often a gap between NLP results and what the clinical study is after. For example, a radiology report does not contain the term - ischemic stroke. Instead, it reports the patient had blocked arteries and stroke. To infer the "unspoken" ischemic stroke, a mechanism is needed to do such inferences from NLP identifiable mentions of blocked arteries and stroke. nlp2phenome is designed for doing this extra step from NLP to patient phenome. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact nlp2phenome was developed for a stroke subtyping study using NLP on radiology reports in Edinburgh University. It is based on top of SemEHR results. It identified 2,922 mentions of 32 types of phenotypes from 266 radiology reports and achieved an average F1: 0.929; Precision: 0.925; Recall: 0.939. 
URL https://github.com/CogStack/nlp2phenome