ISCF HDRUK DIH Sprint Exemplar: Cloud-based integration of phenotype and genotype data for rare disease research

Lead Research Organisation: Cambridge University Hospitals NHS Foundation Trust
Department Name: UNLISTED

Abstract

One in 17 people have a rare disease. Rare diseases can be extremely difficult to diagnose, but they often have an unidentified genetic cause. Recent advances in clinical imaging, pathology, and genomic technologies have led to remarkable progress in understanding disease - particularly rare diseases. However, the power of these technologies cannot be fully realised until the immense volume of data generated can be integrated with NHS data, then analysed by researchers in a secure environment that protects the privacy of individuals.
Working across the NHS, academia and industry we will use existing tools to transfer data from NHS Trusts to a secure environment that interfaces with the NHS network and shares data with Public Health England. NHS information will then be combined with research data in a cloud-based platform. Initially, we will involve patients with rare diseases recruited to the NIHR BioResource; a national resource of volunteers who have already provided consent that information retrieved from their health records can be used for medical research. This will create a rich research resource with the potential to transform our understanding of rare genetic disorders, drive improvements in diagnosis and management, and provide proof of principle for use in other diseases.

Technical Summary

In depth phenotyping, including pathology and imaging technologies, and innovations in genomics, including whole genome sequencing, have led to remarkable progress in understanding rare diseases. These are commonly genetic and collectively affect around 7% of the population. Maximising the benefits of these advanced technologies requires integration of multi-dimensional data in a secure environment for analysis at scale. Using tools developed by Public Health England for national disease registration and FHIR we will create a clinical, phenotypic and genomic dataset integrated in Microsoft Azure. We will provide proof of principle that can be extended to other datasets by using data from patients with rare diseases recruited to the NIHR BioResource. These patients have consented to be contacted about academic and industry led research studies according to their genotype and phenotype, and for their data to be used in medical research. Two factors limit the power of this resource. Currently, NHS data are not routinely transferred to the research database; also, phenotypic and genomic data are held in separate databases. This proposal seeks to address these limitations by integrating data in a secure environment where they can be both analysed anonymously and de-identified to allow contact and recall of individuals.

People

ORCID iD

Publications

10 25 50
 
Description Biorepository for Translational Medicine
Amount $800,000 (USD)
Organisation Chan Zuckerberg Initiative 
Sector Private
Country United States
Start 02/2019 
End 01/2022
 
Description Cloud-based integration of genotype and phenotype data for rare diseases research
Amount £400,000 (GBP)
Funding ID MC_PC_18030 
Organisation Health Data Research UK 
Sector Private
Country United Kingdom
Start 02/2019 
End 11/2019
 
Description HEALTH DATA RESEARCH HUB
Amount £4,795,568 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 10/2019 
End 09/2022
 
Title Cloud based integration of research data 
Description Development of technical architecture for integration of health and genomic data that can be accessed in an Azure cloud. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Use cases are in progress. 
 
Title IBD BioResource - Gut Reaction 
Description The NIHR IBD Bioresource comprises ~30k participants with Inflammatory Bowel Disease (IBD), which include the members of the HDR UK IBD Hub. For the Gut Reaction programme, 10 NHS Trusts have been asked to provide detailed data on the participants in their Trust. Categories of data requested include: test results; prescribing; imaging; digital pathology; data from disease-specific databases and registries; and discharge summaries. While the formats and contents will vary, the hope is that this will be a much richer source of data than nationally collated datasets, like NHS Digital. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The dataset allows researchers to check feasibility of proposed research, and inform applications for research studies using the NIHR IBD BioResource. 
URL https://web.www.healthdatagateway.org/dataset/ae1898da-7fe5-4fe0-b10d-9f06cdec1735