ISCF HDRUK DIH Sprint Exemplar: Graph-Based Data Federation for Healthcare Data Science

Lead Research Organisation: University of Edinburgh


We know that answers to many in-depth healthcare questions can only be explored if we can look across data for the whole UK. However, we manage data locally and describe it in different ways to suit local communities. To get a global view from local data we need a map that tells us precisely where to look for data and how to interpret it when we find it. If we have this sort of map then we can use it to link data between localities in a way that makes access more predictable and rapid while also allowing the people managing different data sets to retain control of how the data in their charge is shared. We can also treat the map itself as data that can be shared to give insights into potential uses of data linkage and to encourage as wide a variety of innovators as possible to build tools that can be used across the data landscape; enriching the data, revealing new knowledge and extending the map.

Technical Summary

The Digital Innovation Hub Programme must establish data coverage nationally across the UK for a wide variety of data sets (primary, secondary, social care, etc) across many dimensions (genotypic, phenotypic, etc) for data sets that are curated locally in many data formats at many sites. This requires a single framework and a common approach to interoperability. Our aim is to provide a convincing demonstration that these data sets can be linked flexibly through graph data and that this linkage can be used to support practical, adaptive data maintenance and inference of knowledge beyond that available to HDR-UK by other means. We will do this by deploying well understood techniques from ontology definition (based on graph data languages) to provide a formal, extensible “map” of the data assets – telling us precisely which queries could practically be made within and across data sets. Our framework will be based on generic and open data standards, enabling HDR-UK to provide opportunities for industry methods consistent with the framework to be used to acquire, manage and analyse linked data while preserving governance oversight and privacy.


10 25 50
publication icon
Ibrahim ZM (2020) On classifying sepsis heterogeneity in the ICU: insight using machine learning. in Journal of the American Medical Informatics Association : JAMIA