ISCF HDRUK DIH Sprint Exemplar: Graph-Based Data Federation for Healthcare Data Science
Lead Research Organisation:
University of Edinburgh
Department Name: UNLISTED
Abstract
We know that answers to many in-depth healthcare questions can only be explored if we can look across data for the whole UK. However, we manage data locally and describe it in different ways to suit local communities. To get a global view from local data we need a map that tells us precisely where to look for data and how to interpret it when we find it. If we have this sort of map then we can use it to link data between localities in a way that makes access more predictable and rapid while also allowing the people managing different data sets to retain control of how the data in their charge is shared. We can also treat the map itself as data that can be shared to give insights into potential uses of data linkage and to encourage as wide a variety of innovators as possible to build tools that can be used across the data landscape; enriching the data, revealing new knowledge and extending the map.
Technical Summary
The Digital Innovation Hub Programme must establish data coverage nationally across the UK for a wide variety of data sets (primary, secondary, social care, etc) across many dimensions (genotypic, phenotypic, etc) for data sets that are curated locally in many data formats at many sites. This requires a single framework and a common approach to interoperability. Our aim is to provide a convincing demonstration that these data sets can be linked flexibly through graph data and that this linkage can be used to support practical, adaptive data maintenance and inference of knowledge beyond that available to HDR-UK by other means. We will do this by deploying well understood techniques from ontology definition (based on graph data languages) to provide a formal, extensible “map” of the data assets – telling us precisely which queries could practically be made within and across data sets. Our framework will be based on generic and open data standards, enabling HDR-UK to provide opportunities for industry methods consistent with the framework to be used to acquire, manage and analyse linked data while preserving governance oversight and privacy.
Organisations
Publications

Carr E
(2021)
Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study.
in BMC medicine


Casey A
(2021)
A systematic review of natural language processing applied to radiology reports
in BMC Medical Informatics and Decision Making


Davidson EM
(2021)
The reporting quality of natural language processing studies: systematic review of studies of radiology reports.
in BMC medical imaging

Dong H
(2021)
Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.
in Journal of biomedical informatics

Dong H
(2023)
Ontology-driven and weakly supervised rare disease identification from clinical notes.
in BMC medical informatics and decision making

Dong H
(2022)
Automated clinical coding: what, why, and where we are?
in npj Digital Medicine

Dong H
(2021)
Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.
in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Title | Additional file 5 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 5: Figure S1. Calibration (logistic and LOESS curves) of supplemented NEWS2 model for 3-day ICU/death model at validation sites. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://springernature.figshare.com/articles/figure/Additional_file_5_of_Evaluation_and_improvement_... |
Title | Additional file 5 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 5: Figure S1. Calibration (logistic and LOESS curves) of supplemented NEWS2 model for 3-day ICU/death model at validation sites. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://springernature.figshare.com/articles/figure/Additional_file_5_of_Evaluation_and_improvement_... |
Title | Additional file 8 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 8: Figure S2. Net benefit of supplemented NEWS2 model for 3-day ICU/death compared to default strategies ('treat all' and 'treat none') at training and validation sites. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://springernature.figshare.com/articles/figure/Additional_file_8_of_Evaluation_and_improvement_... |
Title | Additional file 8 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 8: Figure S2. Net benefit of supplemented NEWS2 model for 3-day ICU/death compared to default strategies ('treat all' and 'treat none') at training and validation sites. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://springernature.figshare.com/articles/figure/Additional_file_8_of_Evaluation_and_improvement_... |
Description | Artificial Intelligence and Multimorbidity: Clustering in Individuals, Space and Clinical Context (AIM-CISC) |
Amount | £3,919,510 (GBP) |
Funding ID | NIHR202639 |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 07/2021 |
End | 08/2024 |
Description | Building a database of the immunohistochemical profiles of tumours from histopathology reports at scale using large language models and machine learning |
Amount | £59,907 (GBP) |
Funding ID | PGS23 100040 |
Organisation | Rosetrees Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2023 |
End | 10/2025 |
Description | Facilitating Better Urology Care With Effective And Fair Use Of Artificial Intelligence - A Partnership Between UCL And Shanghai Jiao Tong University School Of Medicine |
Amount | £39,968 (GBP) |
Organisation | British Council |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2024 |
End | 02/2026 |
Description | Improving the quality and value of care for people with poor prognosis cancers - a national, mixed methods study across Scotland |
Amount | £399,224 (GBP) |
Organisation | Health Foundation |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2020 |
End | 08/2023 |
Description | Iris.AI - The AI Chemist |
Amount | £39,000 (GBP) |
Organisation | Research Council of Norway |
Sector | Public |
Country | Norway |
Start | 07/2021 |
End | 01/2022 |
Description | QMIA: Quantifying and Mitigating Bias affecting and induced by AI in Medicine |
Amount | £649,218 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2023 |
End | 03/2026 |
Description | The Advanced Care Research Centre Programme |
Amount | £20,000,000 (GBP) |
Organisation | Legal and General Group |
Sector | Private |
Country | United Kingdom |
Start | 03/2020 |
End | 04/2026 |
Description | Towards an AI-driven Health Informatics Platform for supporting clinical decision making in Scotland - a pilot study in NHS Lothian |
Amount | £29,200 (GBP) |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 01/2020 |
End | 02/2021 |
Description | UCL-NMU-SEU International Collaboration On Artificial Intelligence In Medicine: Tackling Challenges Of Low Generalisability And Health Inequality |
Amount | £29,400 (GBP) |
Organisation | British Council |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 02/2022 |
End | 02/2024 |
Description | Using rare disease phenotype models to identify people at risk of COVID-19 adverse outcomes |
Amount | £38,065 (GBP) |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 01/2023 |
End | 03/2023 |
Title | Additional file 1 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 1: Table S1. SNOMED terms. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Evaluation_and_improvement... |
Title | Additional file 1 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 1: Table S1. SNOMED terms. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Evaluation_and_improvement... |
Title | Additional file 2 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 2: Table S2. F1, precision and recall for NLP comorbidity detection. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Evaluation_and_improvement... |
Title | Additional file 2 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 2: Table S2. F1, precision and recall for NLP comorbidity detection. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Evaluation_and_improvement... |
Title | Additional file 3 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 3: Table S3. Logistic regression models for each blood and physiological measure tested separately in the KCH training cohort, for 14- and 3-day ICU/death. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Evaluation_and_improvement... |
Title | Additional file 3 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 3: Table S3. Logistic regression models for each blood and physiological measure tested separately in the KCH training cohort, for 14- and 3-day ICU/death. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Evaluation_and_improvement... |
Title | Additional file 4 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 4: Table S4. Internally validated discrimination for KCH training sample based on nested repeated cross-validation. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Evaluation_and_improvement... |
Title | Additional file 4 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 4: Table S4. Internally validated discrimination for KCH training sample based on nested repeated cross-validation. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Evaluation_and_improvement... |
Title | Additional file 6 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 6: Table S5. Univariate logistic regression models for sensitivity analyses showing odds ratios of ICU/death at 3- and 14-days for subsets of the training cohort. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_6_of_Evaluation_and_improvement... |
Title | Additional file 6 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 6: Table S5. Univariate logistic regression models for sensitivity analyses showing odds ratios of ICU/death at 3- and 14-days for subsets of the training cohort. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_6_of_Evaluation_and_improvement... |
Title | Additional file 7 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 7: Table S6. Discrimination for all models in training and validation cohorts, including alternative baseline model of 'NEWS2 only'. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_7_of_Evaluation_and_improvement... |
Title | Additional file 7 of Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study |
Description | Additional file 7: Table S6. Discrimination for all models in training and validation cohorts, including alternative baseline model of 'NEWS2 only'. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_7_of_Evaluation_and_improvement... |
Title | Ensemble Learning for COVID-19 Risk Prediction |
Description | - implemented 7 prognosis risk prediction models for COVID-19. Detailed info in this paper: DOI:10.1093/jamia/ocaa295 - introduced a competence quantification framework for assessing the competence/confidence of a model in predicting a given data entry (i.e. a digital representation of a covid patient) - ensembled 7 prediction models for prediction using fusion strategies based on their competences - evaluated single models and the ensembled mode on two large COVID-19 cohorts from Wuhan, China (N=2,384) and King's College Hospital (N=1,475) |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | - Ensemble model works the best on all aspects evaluated (PPV/Sensitivity/Calibration/Discrimination) - Findings from this study informed SAGE during the COVID-19 pandemic |
URL | https://github.com/Honghan/EnsemblePrediction |
Description | Input into HDR UK Data Standards Paper |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Input into HDR UK Data Standards Paper |
Year(s) Of Engagement Activity | 2020 |
Description | Input into HDR UK Trusted Research Environments Green Paper |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Input into HDR UK Trusted Research Environments Green Paper. |
Year(s) Of Engagement Activity | 2020 |
URL | https://ukhealthdata.org/wp-content/uploads/2020/07/200723-Alliance-Board_Paper-E_TRE-Green-Paper.pd... |
Description | Invited External Advisory Board Member |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | European Consortium for Research on Patient-derived xenografts, EurOPDX (www.europdx.eu). July 2019 - Present |
Year(s) Of Engagement Activity | 2017,2019,2020 |
Description | Invited Member of MRC Population Health Sciences Group (PHSG) |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Policymakers/politicians |
Results and Impact | Oversee population health sciences investment across MRC Boards and panels. Advise MRC Strategy Board, boards and panels on development and implementation of strategies and policies. Advise on strategic funding initiatives and partnership activities. Carry out gap analyses and horizon scanning. |
Year(s) Of Engagement Activity | 2020 |
Description | Invited Speaker: HDR UK Phenotype Portal |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | HDR UK Conference |
Year(s) Of Engagement Activity | 2020 |
Description | Invited to present to researchers at Queens University Belfast |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | Promote the outputs of the PICTURES programme and HDR UK infrastructure to support research using routinely collected data. Talk was "Data on a Mission". |
Year(s) Of Engagement Activity | 2020 |