ISCF HDRUK DIH Sprint Exemplar: Graph-Based Data Federation for Healthcare Data Science

Lead Research Organisation: University of Edinburgh

Abstract

We know that answers to many in-depth healthcare questions can only be explored if we can look across data for the whole UK. However, we manage data locally and describe it in different ways to suit local communities. To get a global view from local data we need a map that tells us precisely where to look for data and how to interpret it when we find it. If we have this sort of map then we can use it to link data between localities in a way that makes access more predictable and rapid while also allowing the people managing different data sets to retain control of how the data in their charge is shared. We can also treat the map itself as data that can be shared to give insights into potential uses of data linkage and to encourage as wide a variety of innovators as possible to build tools that can be used across the data landscape; enriching the data, revealing new knowledge and extending the map.

Technical Summary

The Digital Innovation Hub Programme must establish data coverage nationally across the UK for a wide variety of data sets (primary, secondary, social care, etc) across many dimensions (genotypic, phenotypic, etc) for data sets that are curated locally in many data formats at many sites. This requires a single framework and a common approach to interoperability. Our aim is to provide a convincing demonstration that these data sets can be linked flexibly through graph data and that this linkage can be used to support practical, adaptive data maintenance and inference of knowledge beyond that available to HDR-UK by other means. We will do this by deploying well understood techniques from ontology definition (based on graph data languages) to provide a formal, extensible “map” of the data assets – telling us precisely which queries could practically be made within and across data sets. Our framework will be based on generic and open data standards, enabling HDR-UK to provide opportunities for industry methods consistent with the framework to be used to acquire, manage and analyse linked data while preserving governance oversight and privacy.

Publications

10 25 50

publication icon
Gao C (2022) A National Network of Safe Havens: Scottish Perspective. in Journal of medical Internet research

publication icon
Ibrahim ZM (2020) On classifying sepsis heterogeneity in the ICU: insight using machine learning. in Journal of the American Medical Informatics Association : JAMIA

publication icon
Kuang X (2020) MRI-SegFlow: a novel unsupervised deep learning pipeline enabling accurate vertebral segmentation of MRI images. in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

publication icon
Mirza L (2021) Investigating the association between physical health comorbidities and disability in individuals with severe mental illness. in European psychiatry : the journal of the Association of European Psychiatrists

publication icon
Rannikmäe K (2021) Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke. in BMC medical informatics and decision making

publication icon
Wu H (2021) Ensemble learning for poor prognosis predictions: A case study on SARS-CoV-2. in Journal of the American Medical Informatics Association : JAMIA

publication icon
Wu H (2020) Knowledge Driven Phenotyping. in Studies in health technology and informatics

 
Description Artificial Intelligence and Multimorbidity: Clustering in Individuals, Space and Clinical Context (AIM-CISC)
Amount £3,919,510 (GBP)
Funding ID NIHR202639 
Organisation National Institute for Health Research 
Sector Public
Country United Kingdom
Start 07/2021 
End 08/2024
 
Description Improving the quality and value of care for people with poor prognosis cancers - a national, mixed methods study across Scotland
Amount £399,224 (GBP)
Organisation Health Foundation 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2020 
End 08/2023
 
Description Iris.AI - The AI Chemist
Amount £39,000 (GBP)
Organisation Research Council of Norway 
Sector Public
Country Norway
Start 07/2021 
End 01/2022
 
Description The Advanced Care Research Centre Programme
Amount £20,000,000 (GBP)
Organisation Legal and General Group 
Sector Private
Country United Kingdom
Start 03/2020 
End 04/2026
 
Description Towards an AI-driven Health Informatics Platform for supporting clinical decision making in Scotland - a pilot study in NHS Lothian
Amount £29,200 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2020 
End 02/2021
 
Description UCL-NMU-SEU International Collaboration On Artificial Intelligence In Medicine: Tackling Challenges Of Low Generalisability And Health Inequality
Amount £29,400 (GBP)
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2022 
End 02/2024
 
Title Ensemble Learning for COVID-19 Risk Prediction 
Description - implemented 7 prognosis risk prediction models for COVID-19. Detailed info in this paper: DOI:10.1093/jamia/ocaa295 - introduced a competence quantification framework for assessing the competence/confidence of a model in predicting a given data entry (i.e. a digital representation of a covid patient) - ensembled 7 prediction models for prediction using fusion strategies based on their competences - evaluated single models and the ensembled mode on two large COVID-19 cohorts from Wuhan, China (N=2,384) and King's College Hospital (N=1,475) 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact - Ensemble model works the best on all aspects evaluated (PPV/Sensitivity/Calibration/Discrimination) - Findings from this study informed SAGE during the COVID-19 pandemic 
URL https://github.com/Honghan/EnsemblePrediction
 
Description Input into HDR UK Data Standards Paper 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Input into HDR UK Data Standards Paper
Year(s) Of Engagement Activity 2020
 
Description Input into HDR UK Trusted Research Environments Green Paper 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Input into HDR UK Trusted Research Environments Green Paper.
Year(s) Of Engagement Activity 2020
URL https://ukhealthdata.org/wp-content/uploads/2020/07/200723-Alliance-Board_Paper-E_TRE-Green-Paper.pd...
 
Description Invited External Advisory Board Member 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact European Consortium for Research on Patient-derived xenografts, EurOPDX (www.europdx.eu). July 2019 - Present
Year(s) Of Engagement Activity 2017,2019,2020
 
Description Invited Member of MRC Population Health Sciences Group (PHSG) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Oversee population health sciences investment across MRC Boards and panels. Advise MRC Strategy Board, boards and panels on development and implementation of strategies and policies. Advise on strategic funding initiatives and partnership activities. Carry out gap analyses and horizon scanning.
Year(s) Of Engagement Activity 2020
 
Description Invited Speaker: HDR UK Phenotype Portal 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact HDR UK Conference
Year(s) Of Engagement Activity 2020
 
Description Invited to present to researchers at Queens University Belfast 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact Promote the outputs of the PICTURES programme and HDR UK infrastructure to support research using routinely collected data. Talk was "Data on a Mission".
Year(s) Of Engagement Activity 2020