Using Knowledge Graph Learning to Predict and Explain Patient Outcomes in Electronic Health Records
Lead Research Organisation:
King's College London
Department Name: Biostatistics
Abstract
The aim of this project is to develop a system that can automatically predict and explain patient outcomes. The purpose of the research is to improve patient care by analysing anonymised electronic medical records at very large scale. For example, the methods developed could predict that a new drug will have a rare but serious side effect, or that there is a potentially preventable cause of a negative treatment outcome of a specific group of patients. This is possible because we can represent information as a network. Networks are a general way to represent the connections between things, such as friendships between people, links between websites, or molecular reactions in a cell. Networks contain "nodes" (the things) and "edges" (connections between the things). In the friendship network, people would be the nodes and there would be an edge between all the pairs of people who are friends. Graph Theory is a set of mathematical principles we can use to analyse any type of network to try to understand how the structure of the connections relates to the overall function.
In this fellowship, a large network will be created that combines publicly available data on medications, diseases and cell biology with anonymised data extracted from electronic medical records. One of the most powerful aspects of this network approach is that it allows these different types of information to be directly connected, and represents exactly how they relate to each other. This allows a computer to reason about patient outcomes with the extra context of existing medical knowledge. Algorithms can analyse this network to make predictions based on the known connections between things (for example, paracetamol is known to work as a painkiller, other drugs similar to paracetamol might also be effective painkillers). Whilst the meaning of these relationships is often intuitive to a person, it is challenging to develop algorithms that can apply this type of reasoning. The purpose of this fellowship is to develop such methods and apply them to make clinically useful predictions.
The first part of the work is to combine the publicly available data and create the large network of known facts that could be useful to explain patient outcomes. This network will then be used to develop and optimise the algorithms that will make the predictions, by training them to predict known associations such as drug side effects or disease risk factors. With the network and the algorithms ready, the work then proceeds in two directions. Firstly, we can look at the graph and predict "missing information", meaning that given everything we know about the drugs that can cause a serious side effect (e.g. Stevens-Johnson syndrome), it's very likely that drugs A, B and C also could cause it. These predictions are then validated by analysing anonymised electronic medical records. The second side to the project is to explain outcomes that are observed in medical records. The first step there is to identify a trend, such as identifying a population of patients who respond poorly to treatment or have an unusually high rate of a negative outcome. We can use the graph to predict why this pattern exists, given all of the medical information available to the predictive algorithm. These patterns, along with their predicted explanations, will be subject to medical review and used to inform policy and best practice decisions to improve patient care.
In this fellowship, a large network will be created that combines publicly available data on medications, diseases and cell biology with anonymised data extracted from electronic medical records. One of the most powerful aspects of this network approach is that it allows these different types of information to be directly connected, and represents exactly how they relate to each other. This allows a computer to reason about patient outcomes with the extra context of existing medical knowledge. Algorithms can analyse this network to make predictions based on the known connections between things (for example, paracetamol is known to work as a painkiller, other drugs similar to paracetamol might also be effective painkillers). Whilst the meaning of these relationships is often intuitive to a person, it is challenging to develop algorithms that can apply this type of reasoning. The purpose of this fellowship is to develop such methods and apply them to make clinically useful predictions.
The first part of the work is to combine the publicly available data and create the large network of known facts that could be useful to explain patient outcomes. This network will then be used to develop and optimise the algorithms that will make the predictions, by training them to predict known associations such as drug side effects or disease risk factors. With the network and the algorithms ready, the work then proceeds in two directions. Firstly, we can look at the graph and predict "missing information", meaning that given everything we know about the drugs that can cause a serious side effect (e.g. Stevens-Johnson syndrome), it's very likely that drugs A, B and C also could cause it. These predictions are then validated by analysing anonymised electronic medical records. The second side to the project is to explain outcomes that are observed in medical records. The first step there is to identify a trend, such as identifying a population of patients who respond poorly to treatment or have an unusually high rate of a negative outcome. We can use the graph to predict why this pattern exists, given all of the medical information available to the predictive algorithm. These patterns, along with their predicted explanations, will be subject to medical review and used to inform policy and best practice decisions to improve patient care.
Technical Summary
This fellowship will develop predictive analytics algorithms for knowledge graphs, and apply those methods to a biomedical graph to predict patient outcomes in electronic health records.
The first steps are to develop the knowledge graph and optimise the predictive algorithms, building on my recent work [Bean et al. 2017]. The expanded graph will additionally include proteins, genes, ontologies, diseases and guidelines. Two predictive algorithms will be developed and optimised. A) Improving the performance of my published KG edge prediction method, focused on efficiency (gradient descent) and accuracy through to the use of both additional features (public knowledge and network features) and improved weighting of those features (confidence, provenance). B) New methods to predict the mechanistic basis for an edge in the graph. For example, the basis for a drug indication would involve the protein target of the drug, the pathways those proteins are involved in, and how those pathways relate to the indicated condition. Performance will be evaluated using the known mechanism of action for drugs.
Both of these predictive algorithms will be validated using anonymised EHR data. Firstly, new predictions for outcomes will be generated from the graph and validated using anonymised EHR data extracted using CogStack (data surfacing and harmonisation) and the SemEHR (semantic annotation and search) projects at KCH.
Secondly, one or more cohorts of patients with differential outcomes will be identified - for example treatment response or an unexpected drug-drug interaction. For these patients, classifiers will be trained to predict outcome from routine EHR data. The predictive features are essentially a set of clinical variables that are risk factors for the outcome. The explanation prediction algorithm will use these relationships to propose a mechanistic basis from the knowledge graph. Proposed mechanisms will be manually reviewed for plausibility.
The first steps are to develop the knowledge graph and optimise the predictive algorithms, building on my recent work [Bean et al. 2017]. The expanded graph will additionally include proteins, genes, ontologies, diseases and guidelines. Two predictive algorithms will be developed and optimised. A) Improving the performance of my published KG edge prediction method, focused on efficiency (gradient descent) and accuracy through to the use of both additional features (public knowledge and network features) and improved weighting of those features (confidence, provenance). B) New methods to predict the mechanistic basis for an edge in the graph. For example, the basis for a drug indication would involve the protein target of the drug, the pathways those proteins are involved in, and how those pathways relate to the indicated condition. Performance will be evaluated using the known mechanism of action for drugs.
Both of these predictive algorithms will be validated using anonymised EHR data. Firstly, new predictions for outcomes will be generated from the graph and validated using anonymised EHR data extracted using CogStack (data surfacing and harmonisation) and the SemEHR (semantic annotation and search) projects at KCH.
Secondly, one or more cohorts of patients with differential outcomes will be identified - for example treatment response or an unexpected drug-drug interaction. For these patients, classifiers will be trained to predict outcome from routine EHR data. The predictive features are essentially a set of clinical variables that are risk factors for the outcome. The explanation prediction algorithm will use these relationships to propose a mechanistic basis from the knowledge graph. Proposed mechanisms will be manually reviewed for plausibility.
Organisations
- King's College London (Fellow, Lead Research Organisation)
- UNIVERSITY HOSPITAL SOUTHAMPTON NHS FOUNDATION TRUST (Collaboration)
- King's College Hospital (Collaboration)
- Royal Devon and Exeter NHS Foundation Trust (Collaboration)
- UNIVERSITY HOSPITALS BRISTOL NHS FOUNDATION TRUST (Collaboration)
- National Cancer Research Institute (NCRI) (Collaboration)
- UNIVERSITY HOSPITALS BIRMINGHAM NHS FOUNDATION TRUST (Collaboration)
- Weston Area Health NHS Trust (Collaboration)
- University College Hospital (Collaboration)
- Tongji University Hospital (Collaboration)
- Wuhan Sixth Hospital (Collaboration)
- University of Ghent (Collaboration)
- Guy's and St Thomas' NHS Foundation Trust (Collaboration)
- Ghent University Hospital (Collaboration)
- Oslo University Hospital (Collaboration)
People |
ORCID iD |
Daniel Bean (Principal Investigator / Fellow) |
Publications
Bean DM
(2019)
Semantic computational analysis of anticoagulation use in atrial fibrillation from real world data.
in PloS one
Bean DM
(2019)
A patient flow simulator for healthcare management education.
in BMJ simulation & technology enhanced learning
Searle T.
(2019)
MedCATTrainer: A biomedical free text annotation interface with active learning and research use case specific customisation
in EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Proceedings of System Demonstrations
De Spiegeleer A
(2020)
The Effects of ARBs, ACEis, and Statins on Clinical Outcomes of COVID-19 Infection Among Nursing Home Residents.
in Journal of the American Medical Directors Association
Bean DM
(2020)
Angiotensin-converting enzyme inhibitors and angiotensin II receptor blockers are not associated with severe COVID-19 infection in a multi-site UK acute hospital trust.
in European journal of heart failure
Description | 3x Reports to SAGE from HDRUK |
Geographic Reach | National |
Policy Influence Type | Citation in other policy documents |
URL | https://www.hdruk.ac.uk/covid-19/our-work-to-help-sage/ |
Description | All party parliamentary group speech |
Geographic Reach | National |
Policy Influence Type | Citation in other policy documents |
URL | https://www.gov.uk/government/speeches/adding-years-to-life-and-life-to-years-our-plan-to-increase-h... |
Description | Citation in 7 systematic reviews |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Citation in systematic reviews |
Description | NHSx tech plan |
Geographic Reach | National |
Policy Influence Type | Citation in other policy documents |
URL | https://jointheconversation.scwcsu.nhs.uk/3217/widgets/10570/documents/4079 |
Description | UKRI AI review |
Geographic Reach | National |
Policy Influence Type | Citation in other policy documents |
URL | https://www.ukri.org/about-us/what-we-do/ai-review-transforming-our-world-with-ai/ |
Description | British heart foundation centre of excellence - pump prime |
Amount | £66,988 (GBP) |
Funding ID | Daniel Bean-BHF-RE/18/2/34213 |
Organisation | British Heart Foundation (BHF) |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 02/2020 |
End | 04/2021 |
Description | Consensus treatment recommendation in haematology |
Amount | £4,752 (GBP) |
Organisation | King's College London |
Sector | Academic/University |
Country | United Kingdom |
Start | 03/2022 |
End | 03/2022 |
Description | AML treatment consensus |
Organisation | National Cancer Research Institute (NCRI) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I am developing the software to build and run the clinical models, and the user interface. |
Collaborator Contribution | The partner sites develop and validate clinical models |
Impact | Publication: https://doi.org/10.1111/bjh.18013 Results available here https://amlconsensus.rosalind.kcl.ac.uk/ |
Start Year | 2021 |
Description | AML treatment consensus |
Organisation | Royal Devon and Exeter NHS Foundation Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | I am developing the software to build and run the clinical models, and the user interface. |
Collaborator Contribution | The partner sites develop and validate clinical models |
Impact | Publication: https://doi.org/10.1111/bjh.18013 Results available here https://amlconsensus.rosalind.kcl.ac.uk/ |
Start Year | 2021 |
Description | Automating MDS diagnostic process |
Organisation | Royal Devon and Exeter NHS Foundation Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | We developed the software/informatics tools allowing manual design of diagnostic algorithms and their evaluation. |
Collaborator Contribution | They designed the diagnostic process and performed validation. |
Impact | This is a multi-disciplinary collaboration - haematology and data science/informatics We have accepted conference abstracts and an accepted paper in eJHaem. Further related work is ongoing but has not produced outputs yet. |
Start Year | 2018 |
Description | Belgium - retrospective multicenter cohort study to analyze the association between ACEi/ARB and/or statin use with clinical outcome of COVID-19 |
Organisation | Ghent University Hospital |
Country | Belgium |
Sector | Hospitals |
PI Contribution | We provided the code from our earlier analysis in UK data and contributed to analysis/interpretation. |
Collaborator Contribution | They handled the data and performed the analysis. |
Impact | Publication in JAMDA https://doi.org/10.1016/j.jamda.2020.06.018 |
Start Year | 2020 |
Description | Belgium - retrospective multicenter cohort study to analyze the association between ACEi/ARB and/or statin use with clinical outcome of COVID-19 |
Organisation | University of Ghent |
Country | Belgium |
Sector | Academic/University |
PI Contribution | We provided the code from our earlier analysis in UK data and contributed to analysis/interpretation. |
Collaborator Contribution | They handled the data and performed the analysis. |
Impact | Publication in JAMDA https://doi.org/10.1016/j.jamda.2020.06.018 |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | Guy's and St Thomas' NHS Foundation Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | King's College Hospital |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | Oslo University Hospital |
Country | Norway |
Sector | Hospitals |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | Tongji University Hospital |
Country | China |
Sector | Hospitals |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | University College Hospital |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | University Hospital Southampton NHS Foundation Trust |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | University Hospitals Birmingham NHS Foundation Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | University Hospitals Bristol NHS Foundation Trust |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | Weston Area Health NHS Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Description | NEWS2 - COVID19 international evaluation and improvement |
Organisation | Wuhan Sixth Hospital |
Country | China |
Sector | Hospitals |
PI Contribution | We developed the evaluation protocol and proposed improvements to NEWS2. We performed the training and validation in King's College Hospital data and integrated the results from external sites. |
Collaborator Contribution | All partners extracted and analysed their own data based on the model we provided. |
Impact | Publication in BMC Medicine https://doi.org/10.1186/s12916-020-01893-3 This is an interdisciplinary collaboration including clinicians from multiple specialties and informaticians/statisticans. |
Start Year | 2020 |
Title | AML Consensus Guideline |
Description | The tool provides the results of our survey of UK experts to create a consensus guideline for AML treatment. It also uses other software developed in this award (esyn decision tree tool) to estimate treatment eligibility for research use. |
Type Of Technology | Webtool/Application |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | This tool is part of a broader collaboration with the NCRI. We are now reviewing this tool against the DTAC criteria. |
URL | https://amlconsensus.rosalind.kcl.ac.uk/ |
Title | COVID19-NEWS2 analysis |
Description | This repository provides pre-trained models to validate our analysis of NEWS2 predictive performance for outcomes in COVID19. |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | External validation in five UK NHS Trusts (Guy's and St Thomas' Hospitals, University Hospitals Southampton, University Hospitals Bristol and Weston NHS Foundation Trust, University College London Hospitals, University Hospitals Birmingham), one hospital in Norway (Oslo University Hospital), and two hospitals in Wuhan, China (Wuhan Sixth Hospital and Taikang Tongji Hospital) |
URL | https://github.com/ewancarr/NEWS2-COVID-19 |
Title | DGLinker - Knowledge graph prediction of disease-gene links |
Description | DGLinker is a webserver for the prediction of novel candidate genes for human diseases given a set of known disease genes. DGLinker has a user-friendly interface that allows non-expert users to exploit biomedical information from a wide range of biological and phenotypic databases, and/or to upload their own data, to generate a knowledge-graph and use machine learning to predict new disease-associated genes. The webserver includes tools to explore and interpret the results and generates publication-ready figures. The webserver is free and open to all users without the need for registration. |
Type Of Technology | Webtool/Application |
Year Produced | 2021 |
Impact | This tool is currently under review for the 2021 NAR webserver issue. It is the same algorithm as in our 2020 paper predicting new ALS-linked genes, several of which have since been externally validated. During development of the webserver we performed a temporal external validation which is available at https://github.com/KHP-Informatics/DGLinker-validation |
URL | https://dglinker.rosalind.kcl.ac.uk/ |
Title | EdgePrediction python library |
Description | This library implements the knowledge graph machine learning method developed in https://www.nature.com/articles/s41598-017-16674-x with faster optimisation and updated to python3. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | This library is used in the DGLinker webserver https://dglinker.rosalind.kcl.ac.uk/ |
URL | https://github.com/KHP-Informatics/ADR-graph |
Title | EsyN Decision Tree tool |
Description | This tool allows clinical users to design a decision tree for a range of purposes including process management, subtyping or diagnosis. Models can be tested online and used to generate a simple user interface. Models can be cited and shared openly with the community. |
Type Of Technology | Webtool/Application |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | It has been used to develop a diagnostic tree for myelodysplastic syndrome which was validated in 62 patients from 5 UK hospitals (publication accepted at eJHaem). Following review of algorithm output, the original diagnoses for 3/62 patients were recommended to be revised. |
URL | http://www.esyn.org/builder.php?type=DecisionTree |
Title | Oral anticoagulant NLP |
Description | Extraction of Oral Anticoagulant prescribing from discharge summaries, including negation and various modifiers (e.g. stopped, discontinued) |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | Validated in real data from King's College Hospital https://doi.org/10.1371/journal.pone.0225625 Used to urgently analyse association between ACE inhibitors / ARBs and COVID19 outcomes https://doi.org/10.1002/ejhf.1924 and code https://github.com/dbeanm/ACEi-covid-analysis Used in multiple COVID19 research projects at King's College Hospital. |
URL | https://github.com/CogStack/OAC-NLP |
Title | Risk score builder |
Description | Python library to generate machine-readable definitions of clinical risk scores for use with NLP tools. |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | Validated in real data from King's College Hospital https://doi.org/10.1371/journal.pone.0225625 Further validation is ongoing. |
URL | https://github.com/CogStack/risk-score-builder |
Description | HDRUK public and patient story - interview |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | This was an interview for a case study by HDRUK intended for a general public audience. We were able to emphasise the importance of data access and how carefully we handle sensitive data. |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.hdruk.org/news/does-your-ethnic-background-put-you-more-at-risk-from-covid-19/ |