ISCF HDRUK DIH Sprint Exemplar: Graph-Based Data Federation for Healthcare Data Science

Lead Research Organisation: University of Edinburgh


We know that answers to many in-depth healthcare questions can only be explored if we can look across data for the whole UK. However, we manage data locally and describe it in different ways to suit local communities. To get a global view from local data we need a map that tells us precisely where to look for data and how to interpret it when we find it. If we have this sort of map then we can use it to link data between localities in a way that makes access more predictable and rapid while also allowing the people managing different data sets to retain control of how the data in their charge is shared. We can also treat the map itself as data that can be shared to give insights into potential uses of data linkage and to encourage as wide a variety of innovators as possible to build tools that can be used across the data landscape; enriching the data, revealing new knowledge and extending the map.

Technical Summary

The Digital Innovation Hub Programme must establish data coverage nationally across the UK for a wide variety of data sets (primary, secondary, social care, etc) across many dimensions (genotypic, phenotypic, etc) for data sets that are curated locally in many data formats at many sites. This requires a single framework and a common approach to interoperability. Our aim is to provide a convincing demonstration that these data sets can be linked flexibly through graph data and that this linkage can be used to support practical, adaptive data maintenance and inference of knowledge beyond that available to HDR-UK by other means. We will do this by deploying well understood techniques from ontology definition (based on graph data languages) to provide a formal, extensible “map” of the data assets – telling us precisely which queries could practically be made within and across data sets. Our framework will be based on generic and open data standards, enabling HDR-UK to provide opportunities for industry methods consistent with the framework to be used to acquire, manage and analyse linked data while preserving governance oversight and privacy.


10 25 50

publication icon
Ibrahim Z (2020) On classifying sepsis heterogeneity in the ICU: insight using machine learning in Journal of the American Medical Informatics Association

publication icon
Kuang X (2020) MRI-SegFlow: a novel unsupervised deep learning pipeline enabling accurate vertebral segmentation of MRI images. in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

publication icon
Wu H (2020) Knowledge Driven Phenotyping. in Studies in health technology and informatics

publication icon
Wu H (2020) Ensemble learning for poor prognosis predictions: a case study on SARS-CoV2. in Journal of the American Medical Informatics Association : JAMIA