The UK has incredible information resources, ranging from NHS data collected as part of patient care to custom data collected for research projects such as the genetic information on human disease in the UK Biobank. Protecting patient information has always been one of the highest priorities for these resources, making it difficult for researchers to combine the information held in different places. Unfortunately, this means we are missing opportunities to answer questions about human health, disease and care.

This work will bring together experts from across the UK to develop new ways to use the information held in different places, while still keeping individual patient identities private. The outputs will directly support HDR UK’s ambitious projects, while setting up foundations that will enable the wider research community to make discoveries that improve people’s lives.

Technical Summary

This work is funded by the UKRI Medical Research Council, UKRI Engineering and Physical Sciences Research Council, UKRI Economic and Social Research Council, Department of Health and Social Care, National Institute for Health Research (England), Chief Scientist Office (Scottish Government), Health and Care Research Wales, Public Health Agency HSC (Northern Ireland), British Heart Foundation and Cancer Research UK

HDR UK's vision is to enable FAIR access to population scale data at depth and breadth, enabling linkage of data from many custodians, and federated analyses across Trusted Research Environments (TREs) for many health data researchers across the UK and globally.

This pillar will bring together a UK-wide team of leading technologists, and data scientists from across academia, TRE providers, industry and the NHS all committed to the assembly of an ecosystem of services. Embedding a collaborative, federated delivery model will enable greater patient and public benefit than any single organisation can achieve in isolation, whilst still maintaining autonomy of all involved.

HDR UK’s independence, convening power and deep technical skills allows the Institute to play a distinctive role in the technology ecosystem. HDR UK will enable streamlined data access through the Gateway and facilitate the assembly of tools, technologies, standards and approach for federated data analysis across multiple TREs.

The pillar will build on the technical foundations established by HDR UK in the last five years, together with services provided by national and international partners to:
1. Enable streamlined access to data through the Gateway in an approach which meets the needs of the research data users, data custodians and TRE providers
2. Deliver a portfolio of interoperable and integrated services across TREs, to enable new federated discovery and analytics capabilities, prioritised according to user needs.
3. Support the development of an open, collaborative, trustworthy and secure approach that is adaptable to the changing landscape

Impact and legacy
The Gateway and other core services will provide rapid data discovery, and faster and wider data access for users to seamlessly discover and access a vast range of UK health and related datasets. The Gateway, the TRE ecosystem and federated analytics will together provide an open development community, increasing the number of FAIR datasets and tools, enhancing the overall ecosystem. Pillar 1 will also enhance the technical capacity and capability across the HDR UK network, driving an open standards-based development community in the UK and globally.

The federation of analysis across TREs provides impact by increasing connectivity across TREs, delivering new data linkage models, services, the development of analysis methods, as well as growing analytical capacity.

The Gateway already hosts 720 datasets of which 432 datasets include high quality technical metadata The work of the Technology Services Ecosystem pillar will extend this legacy by addition of new metadata (individual level data remains with TREs) provision of new programmatic models of access and will inform the future approach to delivery of services and cross TRE integration/analytics.

For patients, the work undertaken in this pillar will enable analyses to be more efficient, scalable and applied to larger volumes of data. In turn, this will enable more detailed, faster analyses to answer the research questions which will make a difference to people’s lives.
Title Five Safes RO-Crate 
Description Five Safes RO-Crates enable the exchange of query requests and results between analysis clients and TREs while ensuring that the access is safe and the process transparent. Included within its specification are eight steps that ensure that the RO-Crate's metadata for safe data, safe people, safe projects, safe settings and safe outputs are reviewed according to Five Safes principles. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact It is to be used in the HDR UK QQ2 Federated Analytics workstream and the EOSC-ENTRUST EU Horizon Europe project to create a European network of trusted research environments for sensitive data and to drive European interoperability by joint development of a common blueprint for federated data access and analysis. 
Title 850 dataset descriptions (metadata) discoverable via the Health Data Research Gateway 
Description As of Feb 2024, 850 descriptions of health datasets from across the UK and internationally are available to discover via the Gateway platform. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact As of Feb 2024, 850 descriptions of health datasets from across the UK and internationally are available to discover via the Gateway platform. (current version of Gateway) 
Description ELIXIR Workflow Execution Service 
Organisation Barcelona Supercomputing Center
Country Spain 
Sector Public 
PI Contribution The ELIXIR WfExS develpoed by the Barcelona Supercomputing Centeris used by the TRE-FX project to execute workflows in TREs.
Collaborator Contribution They contributed the WfExS and made revisions
Impact The EU Project EOSC-ENTRUST -, starting 2024. The WfExS Partners in Barcelona and the TRE-FX project partners will work together on federated analytics development using workflows and the Five Safes RO-Crate
Start Year 2023
Description HDR UK joins ELIXIR-UK 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution By joining ELIXIR-UK, HDR UK brings its expertise in health data science to the network. Since being founded in 2018, the Institute has transformed the use of large-scale health data for research, enabled by the application of cutting-edge data science approaches - including the use of Trusted Research Environments - to address some of the most pressing public health challenges in society. Our strategy over the next five years is to increase the scale, quality, speed and impact of health data research. Being a member of ELIXIR-UK both complements and enhances HDR UK's continued efforts to accelerate the trustworthy use of health and related data for research.
Collaborator Contribution ELIXIR-UK offers numerous benefits and opportunities to support, learn from and add value to this agenda through its extensive scientific programmes and community building activities.
Impact Emily Jefferson, CTO of HDR UK has joined the ELIXIR-UK steering committee as the HDR UK representative.
Start Year 2023
Title Cohort Discovery Service 
Description Cohort Discovery is a new Gateway feature that will allow users to carry out a more specific search and assessment on datasets listed in the Gateway to improve the discovery of datasets. Using the tool users can search across multiple datasets to find cohorts (groups) of patients with specific, defined characteristics (e.g patients that don't smoke aged between 18-30 and who live in England). We hope the tool will save both researchers time in finding datasets they need for their research, and also save data custodians time, by minimising enquiries to them about the content of the datasets they hold. Statistical disclosure control policies are in place by default by data custodians on the query results, so low numbers of patients will be excluded to minimise any potential risk of identification. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Impact Nine data custodians have been onboarded into the Cohort Discovery Service which covers a population of 51K subjects. Total subjects discoverable via Cohort Discovery - 41.8K from 8 datasets across: Conditions - 184 Observations - 24 Measurements - 264 Drugs/Medication - 4 Demographics - 18 
Title Courses on the Innovation Gateway 
Description Courses and qualifications related to health data research 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Allows users to search for relevant courses and qualifications 
Title Data Use Register & Data Use Widget 
Description A data use register (also known as a data release register or list of approved projects) offers the public a clear record of how their data is being used, by who and most importantly for what purpose. The Gateway has implemented a Data Use Register to improve the transparency and visibility of research projects undertaken across our Alliance data custodians. A widget for the data use register is also available and aims to provide further transparency in the use of health data for research by making data uses more accessible. Once embedded in a custodian's website, a clickable button takes visitors from a custodian's webpage to the Gateway data use register - prefiltered to display their data uses only - in a single step. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Improved transparency of research Improved discoverability of research Better coordination and standardisation of data use reporting 
Title Gateway Data Access Request Service 
Description Our ambition is for the Gateway to support a streamlined, proportionate approach to access requests based on the five safes model for research and innovation uses with a clearpublic benefit, in line with the Principles for Participation. We aim to make life easier for both requestors and decision makers through a combinationof automation, built-in validation, transparency of progress and the capability to hostvirtual data access request panels. The intention is to build on existing cross-sector best practice both nationally across the UKand internationally. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact For data custodians with existing 'in house' web-based workflow solutions, the module provides validated inputs to their approvals processes andreceive time stamped progress updates for key process points. For data custodians with 'MS Word' based documentation and offline workflow, the module helps them harmonise and streamline theirapproach and provide a scalable, web-based workflow. For new data custodians, the module provides the opportunity to move straight to a 'best of breed' web-based access management request solution. The Gateway provides allows researchers communicate with data custodians directly and confirm data sharing requirements before requesting access, which saves time. 
Title Gateway Federated Metadata Onboarding Service 
Description Federated metadata onboarding allows data providers to synchronise existing metadata catalogues with the Gateway 
Type Of Technology Webtool/Application 
Year Produced 2023 
Open Source License? Yes  
Impact Synchronising metadata catalogues means researchers can view the most up to date descriptions of health datasets and data providers only have to maintain a single source of truth 
Title Gateway Metadata Onboarding Service 
Description The metadata onboarding form enables data custodians to make their datasets findable through the Gateway by providing rich metadata descriptions of their datasets. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Provides an opportunity for datasets to be pulled into the Gateway from external metadata catalogues 
Title Gateway Search Service 
Description The Health Data Research Innovation Gateway is a common entry point for researchers and innovators to search for and find datasets of interest and to request access to. It is possible to search for health datasets and health data assets via a keyword search and filtering options. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Gateway search is based on the HDR UK dataset specification, which is an Alliance-approved standard for describing health datasets, reviewed and recreated with researcher discovery in mind. It includes summary information about the dataset and descriptions of the technical detail of the tables and columns held within the dataset. The standard descriptions allow potential data applicants to understand and assess the usability of data without enquiring directly, saving time for everyone. 
Title HDR Cohort Discovery / TRE-FX / RQuest integration 
Description It allows compatibility between the work of TRE-FX, the HDR Programme (Cohort Discovery) and BC Platforms software 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Now being assessed for use in the NHS England SDE Programme 
Title HDRUK CARROT_Mapper v2 
Description Software which enables federated cohort discovery across cohorts of datasets . Developed with CO CONNECT project, and HDR UK Alleviate Data Hub 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Title HUTCH Softwarev2 
Description Software developed with the DARE UK TRE-FX project that can be deployed within Trusted Research Environments to enable federated analysis 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact DARE UK TRE FX report: 
Title Health Data Research Gateway 
Description Health datasets in the UK are held by thousands of different organisations in the UK. It can therefore be difficult for researchers, innovators and also members of the public and patients, to discover what datasets exist. The Gateway was established in 2020 as a common entry point for researchers and innovators to discover and request access to UK health-related datasets. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact The Gateway lists information about each dataset (such as description, size of the population contained within that dataset, and the legal basis for access) that can help researchers and innovators decide whether a dataset could be useful to their research and help them to make further health discoveries. The Gateway was created with input from patients, the public, researchers and innovators working in health and care in the UK. Health Data Research UK is committed to its continued partnership with these groups as the Gateway develops. There are now over 3000 resources, such as tools, pblications and datasets available for researchers to discover, with over 2000 registered users. 
Title Health Data Research Innovation Gateway 
Description The Health Data Research Innovation Gateway API service source code 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Integrations with external services from industry - MetadataWorks, BC Platforms, FutureNHS 
Title Health Data Research Innovation Gateway 
Description The Innovation Gateway provides a common entry point to discover and enquire about access to UK health datasets for research and innovation. It provides detailed information about the datasets, which are held by members of the UK Health Data Research Alliance, such as a description, size of the population, and the legal basis for access. The Gateway includes the ability to search for research projects, publications and health data tools, such as those related to COVID-19. New interactive features provide a community forum for researchers to collaborate and connect and the ability to add research projects. The Innovation Gateway does not hold or store any datasets or patient or health data but rather acts as a portal to allow discovery of datasets and to request access to them for health research. To access the data, users need to sign in and then follow the access request process. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Both data custodians and researchers community are increasingly using the Innovation Gateway to drive use of health data in research. The Gateway provides a common front door for researchers to search, discover and understand data before they request access to it. On the other end, the Gateway is enabling a streamlined and harmonised access process for data custodians listing their data in the portal. By addressing one of the main issues in data driven research, access to data, the Gateway is enabling and accelerating discoveries based on safe and trustworthy use of health datasets. 
Title Health Data Research UK GitHub Repository 
Description Open-source code for the Innovation Gateway 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Open-source code for the Innovation Gateway 
Title Innovation Gateway Standards 
Description Innovation Gateway Standards 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Adoption of the standards by the community - SAIL, NHS Digital, Research Data Scotland, Office of National Statistics 
Title Phenotype Library 
Description The HDR UK Phenotype Library is a comprehensive, open access resource providing the research community with information, tools and phenotyping algorithms for UK electronic health records. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact When patients interact with physicians, or are admitted into hospital, information is collected electronically on their symptoms, diagnoses, laboratory test results, and prescriptions. This information is stored securely in Electronic Health Records (EHR) and is a valuable resource for researchers and clinicians for improving health and healthcare. EHRs are however of variable detail and quality and contain many inconsistencies. As a result, researchers and data providers spend considerable time creating complex computer programs to fix and statistically analyse the information in EHR and identify which patients have which disease. Currently, there is no means to share these tools across institutions in the UK resulting in duplication of effort. Reproducibility of research is also hampered as others do not have access to the precise methods and definitions used in a particular study. This project addresses these issues by creating an open resource for EHR users (researchers, clinicians, the NHS and data providers) to share their methods. 
Title Publications on the Innovation Gateway 
Description Pre-prints papers and articles which site the use of health datasets for research (uploaded by Gateway users) 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Ability for users to search for publications alongside health datasets 
Title Tools on the Innovation Gateway 
Description Software, scripts and useful resources (uploaded by Gateway users) 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Allows users to search for relevant tools, scripts and software to support them in their research. 
Title Workflow for RQuest integration 
Description It is the workflow to process an HDR Cohort Discovery tool query 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Forms part of the work with NHS SDE Programme and HDR Programme 
