Using Knowledge Graph Learning to Predict and Explain Patient Outcomes in Electronic Health Records

Lead Research Organisation: King's College London
Department Name: Biostatistics

Abstract

The aim of this project is to develop a system that can automatically predict and explain patient outcomes. The purpose of the research is to improve patient care by analysing anonymised electronic medical records at very large scale. For example, the methods developed could predict that a new drug will have a rare but serious side effect, or that there is a potentially preventable cause of a negative treatment outcome of a specific group of patients. This is possible because we can represent information as a network. Networks are a general way to represent the connections between things, such as friendships between people, links between websites, or molecular reactions in a cell. Networks contain "nodes" (the things) and "edges" (connections between the things). In the friendship network, people would be the nodes and there would be an edge between all the pairs of people who are friends. Graph Theory is a set of mathematical principles we can use to analyse any type of network to try to understand how the structure of the connections relates to the overall function.

In this fellowship, a large network will be created that combines publicly available data on medications, diseases and cell biology with anonymised data extracted from electronic medical records. One of the most powerful aspects of this network approach is that it allows these different types of information to be directly connected, and represents exactly how they relate to each other. This allows a computer to reason about patient outcomes with the extra context of existing medical knowledge. Algorithms can analyse this network to make predictions based on the known connections between things (for example, paracetamol is known to work as a painkiller, other drugs similar to paracetamol might also be effective painkillers). Whilst the meaning of these relationships is often intuitive to a person, it is challenging to develop algorithms that can apply this type of reasoning. The purpose of this fellowship is to develop such methods and apply them to make clinically useful predictions.

The first part of the work is to combine the publicly available data and create the large network of known facts that could be useful to explain patient outcomes. This network will then be used to develop and optimise the algorithms that will make the predictions, by training them to predict known associations such as drug side effects or disease risk factors. With the network and the algorithms ready, the work then proceeds in two directions. Firstly, we can look at the graph and predict "missing information", meaning that given everything we know about the drugs that can cause a serious side effect (e.g. Stevens-Johnson syndrome), it's very likely that drugs A, B and C also could cause it. These predictions are then validated by analysing anonymised electronic medical records. The second side to the project is to explain outcomes that are observed in medical records. The first step there is to identify a trend, such as identifying a population of patients who respond poorly to treatment or have an unusually high rate of a negative outcome. We can use the graph to predict why this pattern exists, given all of the medical information available to the predictive algorithm. These patterns, along with their predicted explanations, will be subject to medical review and used to inform policy and best practice decisions to improve patient care.

Technical Summary

This fellowship will develop predictive analytics algorithms for knowledge graphs, and apply those methods to a biomedical graph to predict patient outcomes in electronic health records.

The first steps are to develop the knowledge graph and optimise the predictive algorithms, building on my recent work [Bean et al. 2017]. The expanded graph will additionally include proteins, genes, ontologies, diseases and guidelines. Two predictive algorithms will be developed and optimised. A) Improving the performance of my published KG edge prediction method, focused on efficiency (gradient descent) and accuracy through to the use of both additional features (public knowledge and network features) and improved weighting of those features (confidence, provenance). B) New methods to predict the mechanistic basis for an edge in the graph. For example, the basis for a drug indication would involve the protein target of the drug, the pathways those proteins are involved in, and how those pathways relate to the indicated condition. Performance will be evaluated using the known mechanism of action for drugs.

Both of these predictive algorithms will be validated using anonymised EHR data. Firstly, new predictions for outcomes will be generated from the graph and validated using anonymised EHR data extracted using CogStack (data surfacing and harmonisation) and the SemEHR (semantic annotation and search) projects at KCH.

Secondly, one or more cohorts of patients with differential outcomes will be identified - for example treatment response or an unexpected drug-drug interaction. For these patients, classifiers will be trained to predict outcome from routine EHR data. The predictive features are essentially a set of clinical variables that are risk factors for the outcome. The explanation prediction algorithm will use these relationships to propose a mechanistic basis from the knowledge graph. Proposed mechanisms will be manually reviewed for plausibility.

Publications

10 25 50
 
Description 3x Reports to SAGE from HDRUK
Geographic Reach National 
Policy Influence Type Citation in other policy documents
URL https://www.hdruk.ac.uk/covid-19/our-work-to-help-sage/
 
Description All party parliamentary group speech
Geographic Reach National 
Policy Influence Type Citation in other policy documents
URL https://www.gov.uk/government/speeches/adding-years-to-life-and-life-to-years-our-plan-to-increase-h...
 
Description Citation in 7 systematic reviews
Geographic Reach Multiple continents/international 
Policy Influence Type Citation in systematic reviews
 
Description NHSx tech plan
Geographic Reach National 
Policy Influence Type Citation in other policy documents
URL https://jointheconversation.scwcsu.nhs.uk/3217/widgets/10570/documents/4079
 
Description UKRI AI review
Geographic Reach National 
Policy Influence Type Citation in other policy documents
URL https://www.ukri.org/about-us/what-we-do/ai-review-transforming-our-world-with-ai/
 
Description British heart foundation centre of excellence - pump prime
Amount £66,988 (GBP)
Funding ID Daniel Bean-BHF-RE/18/2/34213 
Organisation British Heart Foundation (BHF) 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2020 
End 04/2021
 
Description Consensus treatment recommendation in haematology
Amount £4,752 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 03/2022 
End 03/2022
 
Description AML treatment consensus 
Organisation National Cancer Research Institute (NCRI)
Country United Kingdom 
Sector Academic/University 
PI Contribution I am developing the software to build and run the clinical models, and the user interface.
Collaborator Contribution The partner sites develop and validate clinical models
Impact Publication: https://doi.org/10.1111/bjh.18013 Results available here https://amlconsensus.rosalind.kcl.ac.uk/
Start Year 2021
 
Description AML treatment consensus 
Organisation Royal Devon and Exeter NHS Foundation Trust
Country United Kingdom 
Sector Public 
PI Contribution I am developing the software to build and run the clinical models, and the user interface.
Collaborator Contribution The partner sites develop and validate clinical models
Impact Publication: https://doi.org/10.1111/bjh.18013 Results available here https://amlconsensus.rosalind.kcl.ac.uk/
Start Year 2021
 
Description Automating MDS diagnostic process 
Organisation Royal Devon and Exeter NHS Foundation Trust
Country United Kingdom 
Sector Public 
PI Contribution We developed the software/informatics tools allowing manual design of diagnostic algorithms and their evaluation.
Collaborator Contribution They designed the diagnostic process and performed validation.
Impact This is a multi-disciplinary collaboration - haematology and data science/informatics We have accepted conference abstracts and an accepted paper in eJHaem. Further related work is ongoing but has not produced outputs yet.
Start Year 2018
 
Description Belgium - retrospective multicenter cohort study to analyze the association between ACEi/ARB and/or statin use with clinical outcome of COVID-19 
Organisation Ghent University Hospital
Country Belgium 
Sector Hospitals 
PI Contribution We provided the code from our earlier analysis in UK data and contributed to analysis/interpretation.
Collaborator Contribution They handled the data and performed the analysis.
Impact Publication in JAMDA https://doi.org/10.1016/j.jamda.2020.06.018
Start Year 2020
 
Description Belgium - retrospective multicenter cohort study to analyze the association between ACEi/ARB and/or statin use with clinical outcome of COVID-19 
Organisation University of Ghent
Country Belgium 
Sector Academic/University 
PI Contribution We provided the code from our earlier analysis in UK data and contributed to analysis/interpretation.
Collaborator Contribution They handled the data and performed the analysis.
Impact Publication in JAMDA https://doi.org/10.1016/j.jamda.2020.06.018
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation Guy's and St Thomas' NHS Foundation Trust
Country United Kingdom 
Sector Public 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation King's College Hospital
Country United Kingdom 
Sector Hospitals 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation Oslo University Hospital
Country Norway 
Sector Hospitals 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation Tongji University Hospital
Country China 
Sector Hospitals 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation University College Hospital
Country United Kingdom 
Sector Hospitals 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation University Hospital Southampton NHS Foundation Trust
Country United Kingdom 
Sector Hospitals 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation University Hospitals Birmingham NHS Foundation Trust
Country United Kingdom 
Sector Public 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation University Hospitals Bristol NHS Foundation Trust
Country United Kingdom 
Sector Hospitals 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation Weston Area Health NHS Trust
Country United Kingdom 
Sector Public 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Description NEWS2 - COVID19 international evaluation and improvement 
Organisation Wuhan Sixth Hospital
Country China 
Sector Hospitals 
PI Contribution We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites.
Collaborator Contribution All partners extracted and analysed their own data based on the model we provided.
Impact Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans.
Start Year 2020
 
Title AML Consensus Guideline 
Description The tool provides the results of our survey of UK experts to create a consensus guideline for AML treatment. It also uses other software developed in this award (esyn decision tree tool) to estimate treatment eligibility for research use. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact This tool is part of a broader collaboration with the NCRI. We are now reviewing this tool against the DTAC criteria. 
URL https://amlconsensus.rosalind.kcl.ac.uk/
 
Title COVID19-NEWS2 analysis 
Description This repository provides pre-trained models to validate our analysis of NEWS2 predictive performance for outcomes in COVID19. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact External validation in five UK NHS Trusts (Guy's and St Thomas' Hospitals, University Hospitals Southampton, University Hospitals Bristol and Weston NHS Foundation Trust, University College London Hospitals, University Hospitals Birmingham), one hospital in Norway (Oslo University Hospital), and two hospitals in Wuhan, China (Wuhan Sixth Hospital and Taikang Tongji Hospital) 
URL https://github.com/ewancarr/NEWS2-COVID-19
 
Title DGLinker - Knowledge graph prediction of disease-gene links 
Description DGLinker is a webserver for the prediction of novel candidate genes for human diseases given a set of known disease genes. DGLinker has a user-friendly interface that allows non-expert users to exploit biomedical information from a wide range of biological and phenotypic databases, and/or to upload their own data, to generate a knowledge-graph and use machine learning to predict new disease-associated genes. The webserver includes tools to explore and interpret the results and generates publication-ready figures. The webserver is free and open to all users without the need for registration. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Impact This tool is currently under review for the 2021 NAR webserver issue. It is the same algorithm as in our 2020 paper predicting new ALS-linked genes, several of which have since been externally validated. During development of the webserver we performed a temporal external validation which is available at https://github.com/KHP-Informatics/DGLinker-validation 
URL https://dglinker.rosalind.kcl.ac.uk/
 
Title EdgePrediction python library 
Description This library implements the knowledge graph machine learning method developed in https://www.nature.com/articles/s41598-017-16674-x with faster optimisation and updated to python3. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This library is used in the DGLinker webserver https://dglinker.rosalind.kcl.ac.uk/ 
URL https://github.com/KHP-Informatics/ADR-graph
 
Title EsyN Decision Tree tool 
Description This tool allows clinical users to design a decision tree for a range of purposes including process management, subtyping or diagnosis. Models can be tested online and used to generate a simple user interface. Models can be cited and shared openly with the community. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact It has been used to develop a diagnostic tree for myelodysplastic syndrome which was validated in 62 patients from 5 UK hospitals (publication accepted at eJHaem). Following review of algorithm output, the original diagnoses for 3/62 patients were recommended to be revised. 
URL http://www.esyn.org/builder.php?type=DecisionTree
 
Title Oral anticoagulant NLP 
Description Extraction of Oral Anticoagulant prescribing from discharge summaries, including negation and various modifiers (e.g. stopped, discontinued) 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Validated in real data from King's College Hospital https://doi.org/10.1371/journal.pone.0225625 Used to urgently analyse association between ACE inhibitors / ARBs and COVID19 outcomes https://doi.org/10.1002/ejhf.1924 and code https://github.com/dbeanm/ACEi-covid-analysis Used in multiple COVID19 research projects at King's College Hospital. 
URL https://github.com/CogStack/OAC-NLP
 
Title Risk score builder 
Description Python library to generate machine-readable definitions of clinical risk scores for use with NLP tools. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Validated in real data from King's College Hospital https://doi.org/10.1371/journal.pone.0225625 Further validation is ongoing. 
URL https://github.com/CogStack/risk-score-builder
 
Description HDRUK public and patient story - interview 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact This was an interview for a case study by HDRUK intended for a general public audience. We were able to emphasise the importance of data access and how carefully we handle sensitive data.
Year(s) Of Engagement Activity 2020
URL https://www.hdruk.org/news/does-your-ethnic-background-put-you-more-at-risk-from-covid-19/