Deriving an actionable patient phenome from healthcare data
Lead Research Organisation:
University College London
Department Name: Institute of Health Informatics
Abstract
Translating routinely collected health data into knowledge is a requirement of a "learning health system". Since joining the Biomedical Research Centre at the South London and Maudsley Hospital, Kings College London, my research has been focused on developing 'CogStack and SemEHR'. This is an integrated health informatics platform which aims to to unlock unstructured health records and assist in clinical decision making and research. The system does much to surface the deep data within the NHS, for example through providing a patient-centric search on semantically annotated clinical notes to support studies such as the recruitment of patients for Genomics England's 100,000 Genomes project [1,2] and predicting adverse drug reactions [3].
However, there is considerable further potential for the generation of knowledge and action, for example through the application of machine learning to the data from this platform. For instance, the data returned through these systems needs to be integrated, verified and cleaned with biomedical knowledge, enriched with an accurate clinical context (to enhance the current sentence-level language context) and aligned with the patient timeline to derive a comprehensive patient phenome. Clinical knowledge needs to be formalised from clinical ontologies and integrated with relevant open data, which will drive automated inferences to lift lower-level features (e.g. numeric blood pressure readings) up to higher-level clinical variables (e.g. hypertension) for supporting decision making.
A pilot study of the comprehensive phenome model, SemEHR's medical profiles [2], evaluated on publicly accessible data from the Medical Information Mart for Intensive Care (MIMIC), has proven that better contextual information can lead to much better accuracy in making clinical conclusions - e.g. using patient medical history for subtyping atrial fibrillation where we demonstrated that such phenome data is within the top 10 key features in identifying clinically-sensible patient clusters. For 'action' generation in clinical settings, we have demonstrated the feasibility of alerts through a number of simple examples using CogStack. For example, at Kings College Hospital, we have detected abnormal pathology results for 25 patients being prescribed methotrexate for rheumatoid arthritis, preventing potentially fatal renal failure.
The proposed research will devise a semantic electronic health record toolkit that is able to derive a consistent and comprehensive patient phenome from unstructured and structured electronic health records and provide semantic computation upon it to support decision making for tailored care, trial recruitment and research.
References:
1. Wu H, et al. SemEHR: surfacing semantic data from clinical notes in electronic health records for tailored care, trial recruitment, and clinical research. Lancet. 2017;390: S97.
2. Wu H, et al. A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research. Journal of the American Medical Informatics Association. 2017; doi: https://doi.org/10.1101/235622.
3. Bean DM, Wu H, et al. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep. 2017;7: 16416.
However, there is considerable further potential for the generation of knowledge and action, for example through the application of machine learning to the data from this platform. For instance, the data returned through these systems needs to be integrated, verified and cleaned with biomedical knowledge, enriched with an accurate clinical context (to enhance the current sentence-level language context) and aligned with the patient timeline to derive a comprehensive patient phenome. Clinical knowledge needs to be formalised from clinical ontologies and integrated with relevant open data, which will drive automated inferences to lift lower-level features (e.g. numeric blood pressure readings) up to higher-level clinical variables (e.g. hypertension) for supporting decision making.
A pilot study of the comprehensive phenome model, SemEHR's medical profiles [2], evaluated on publicly accessible data from the Medical Information Mart for Intensive Care (MIMIC), has proven that better contextual information can lead to much better accuracy in making clinical conclusions - e.g. using patient medical history for subtyping atrial fibrillation where we demonstrated that such phenome data is within the top 10 key features in identifying clinically-sensible patient clusters. For 'action' generation in clinical settings, we have demonstrated the feasibility of alerts through a number of simple examples using CogStack. For example, at Kings College Hospital, we have detected abnormal pathology results for 25 patients being prescribed methotrexate for rheumatoid arthritis, preventing potentially fatal renal failure.
The proposed research will devise a semantic electronic health record toolkit that is able to derive a consistent and comprehensive patient phenome from unstructured and structured electronic health records and provide semantic computation upon it to support decision making for tailored care, trial recruitment and research.
References:
1. Wu H, et al. SemEHR: surfacing semantic data from clinical notes in electronic health records for tailored care, trial recruitment, and clinical research. Lancet. 2017;390: S97.
2. Wu H, et al. A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research. Journal of the American Medical Informatics Association. 2017; doi: https://doi.org/10.1101/235622.
3. Bean DM, Wu H, et al. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep. 2017;7: 16416.
Technical Summary
For objective 1, at the data layer, my research will focus on a semantic phenome model that is able to detect/correct erroneous and inconsistent phenotypes, associate accurate contextual and temporal information with each phenotype mention and also support rule based reasoning to complete missing data. For objective 2, I will be devising and applying artificial intelligence models to derive unknown clinical knowledge from large scale, longitudinal and interlinked phenome data. potential use cases include predicting outcomes of septic shock treatments within intensive care units; predicting unknown adverse drug reactions in depression patients with comorbidities; subtyping atrial fibrillation to deliver tailored care. For objective 3, my research will provide actionable suggestions in clinical settings with applications of clinical trial recruitment and automated alerting for ensuring patient safety. Key challenges to be tackled here include how to make action suggestions explainable and reliable.
This project aims to deliver enabling technologies for The University of Edinburgh's HDR UK focus including deriving and applying health-related phenotypes at scale; computational tools for genetic and environmental risk prediction and causal inference. It will develop national leadership, partnerships, and interdisciplinary skills and capacity through the development of semantic computation infrastructure on top of deep and accurate patient phenome data, which if successful, can be disseminated to a wide range of healthcare service providers nationally/internationally and achieve high impact in research and patient care.
This project aims to deliver enabling technologies for The University of Edinburgh's HDR UK focus including deriving and applying health-related phenotypes at scale; computational tools for genetic and environmental risk prediction and causal inference. It will develop national leadership, partnerships, and interdisciplinary skills and capacity through the development of semantic computation infrastructure on top of deep and accurate patient phenome data, which if successful, can be disseminated to a wide range of healthcare service providers nationally/internationally and achieve high impact in research and patient care.
Publications
Carr E
(2021)
Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study.
in BMC medicine
Casey A
(2021)
A systematic review of natural language processing applied to radiology reports.
in BMC medical informatics and decision making
Chen Q
(2022)
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations.
in Database : the journal of biological databases and curation
Cheung JPY
(2022)
Learning-based fully automated prediction of lumbar disc degeneration progression with specified clinical parameters and preliminary validation.
in European spine journal : official publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society
Davidson EM
(2021)
The reporting quality of natural language processing studies: systematic review of studies of radiology reports.
in BMC medical imaging
Dong H
(2021)
Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.
in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Dong H
(2022)
Automated clinical coding: what, why, and where we are?
in npj Digital Medicine
Dong H
(2021)
Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.
in Journal of biomedical informatics
Description | Invited talk at 1st International Symposium on Evidence-based Artificial Intelligence and Medicine (ISEAIM) |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | My talk was titled "Derive insights from health data using knowledge graph technologies". I started with a brief introduction about what is a knowledge graph. Then, I used real-world examples to introduce how knowledge graph technologies could help clinical natural language processing. I finalised the talk with a bit of my own thinking in challenges and future directions of knowledge graphs for health care. |
Description | Artificial Intelligence and Multimorbidity: Clustering in Individuals, Space and Clinical Context (AIM-CISC) |
Amount | £3,919,510 (GBP) |
Funding ID | NIHR202639 |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 07/2021 |
End | 08/2024 |
Description | Building a database of the immunohistochemical profiles of tumours from histopathology reports at scale using large language models and machine learning |
Amount | £59,907 (GBP) |
Funding ID | PGS23 100040 |
Organisation | Rosetrees Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2023 |
End | 10/2025 |
Description | Facilitating Better Urology Care With Effective And Fair Use Of Artificial Intelligence - A Partnership Between UCL And Shanghai Jiao Tong University School Of Medicine |
Amount | £39,968 (GBP) |
Organisation | British Council |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2024 |
End | 02/2026 |
Description | Improving the quality and value of care for people with poor prognosis cancers - a national, mixed methods study across Scotland |
Amount | £399,224 (GBP) |
Organisation | Health Foundation |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2020 |
End | 08/2023 |
Description | Iris.AI - The AI Chemist |
Amount | £39,000 (GBP) |
Organisation | Research Council of Norway |
Sector | Public |
Country | Norway |
Start | 07/2021 |
End | 01/2022 |
Description | QMIA: Quantifying and Mitigating Bias affecting and induced by AI in Medicine |
Amount | £649,218 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2023 |
End | 03/2026 |
Description | The Advanced Care Research Centre Programme |
Amount | £20,000,000 (GBP) |
Organisation | Legal and General Group |
Sector | Private |
Country | United Kingdom |
Start | 03/2020 |
End | 04/2026 |
Description | Towards an AI-driven Health Informatics Platform for supporting clinical decision making in Scotland - a pilot study in NHS Lothian |
Amount | £29,200 (GBP) |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 01/2020 |
End | 02/2021 |
Description | UCL-NMU-SEU International Collaboration On Artificial Intelligence In Medicine: Tackling Challenges Of Low Generalisability And Health Inequality |
Amount | £29,400 (GBP) |
Organisation | British Council |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 02/2022 |
End | 02/2024 |
Description | Using rare disease phenotype models to identify people at risk of COVID-19 adverse outcomes |
Amount | £38,065 (GBP) |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 01/2023 |
End | 03/2023 |
Title | Additional file 1 of Increased COVID-19 mortality rate in rare disease patients: a retrospective cohort study in participants of the Genomics England 100,000 Genomes project |
Description | Additional file 1: Table S2. Lists of ICD-10 codes for comorbidities associated to COVID-19 |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Increased_COVID-19_mortali... |
Title | Additional file 1 of Increased COVID-19 mortality rate in rare disease patients: a retrospective cohort study in participants of the Genomics England 100,000 Genomes project |
Description | Additional file 1: Table S2. Lists of ICD-10 codes for comorbidities associated to COVID-19 |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Increased_COVID-19_mortali... |
Title | Additional file 2 of Increased COVID-19 mortality rate in rare disease patients: a retrospective cohort study in participants of the Genomics England 100,000 Genomes project |
Description | Additional file 2: Table S1. Univariable and multivariable ORs for association between rare disease groups/specific diseases and COVID-19 |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Increased_COVID-19_mortali... |
Title | Additional file 2 of Increased COVID-19 mortality rate in rare disease patients: a retrospective cohort study in participants of the Genomics England 100,000 Genomes project |
Description | Additional file 2: Table S1. Univariable and multivariable ORs for association between rare disease groups/specific diseases and COVID-19 |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Increased_COVID-19_mortali... |