ISCF HDRUK DIH Sprint Exemplar: Cloud-based integration of phenotype and genotype data for rare disease research
Lead Research Organisation:
Addenbrooke's Hospital NHS Trust
Abstract
One in 17 people have a rare disease. Rare diseases can be extremely difficult to diagnose, but they often have an unidentified genetic cause. Recent advances in clinical imaging, pathology, and genomic technologies have led to remarkable progress in understanding disease - particularly rare diseases. However, the power of these technologies cannot be fully realised until the immense volume of data generated can be integrated with NHS data, then analysed by researchers in a secure environment that protects the privacy of individuals.
Working across the NHS, academia and industry we will use existing tools to transfer data from NHS Trusts to a secure environment that interfaces with the NHS network and shares data with Public Health England. NHS information will then be combined with research data in a cloud-based platform. Initially, we will involve patients with rare diseases recruited to the NIHR BioResource; a national resource of volunteers who have already provided consent that information retrieved from their health records can be used for medical research. This will create a rich research resource with the potential to transform our understanding of rare genetic disorders, drive improvements in diagnosis and management, and provide proof of principle for use in other diseases.
Working across the NHS, academia and industry we will use existing tools to transfer data from NHS Trusts to a secure environment that interfaces with the NHS network and shares data with Public Health England. NHS information will then be combined with research data in a cloud-based platform. Initially, we will involve patients with rare diseases recruited to the NIHR BioResource; a national resource of volunteers who have already provided consent that information retrieved from their health records can be used for medical research. This will create a rich research resource with the potential to transform our understanding of rare genetic disorders, drive improvements in diagnosis and management, and provide proof of principle for use in other diseases.
Technical Summary
In depth phenotyping, including pathology and imaging technologies, and innovations in genomics, including whole genome sequencing, have led to remarkable progress in understanding rare diseases. These are commonly genetic and collectively affect around 7% of the population. Maximising the benefits of these advanced technologies requires integration of multi-dimensional data in a secure environment for analysis at scale. Using tools developed by Public Health England for national disease registration and FHIR we will create a clinical, phenotypic and genomic dataset integrated in Microsoft Azure. We will provide proof of principle that can be extended to other datasets by using data from patients with rare diseases recruited to the NIHR BioResource. These patients have consented to be contacted about academic and industry led research studies according to their genotype and phenotype, and for their data to be used in medical research. Two factors limit the power of this resource. Currently, NHS data are not routinely transferred to the research database; also, phenotypic and genomic data are held in separate databases. This proposal seeks to address these limitations by integrating data in a secure environment where they can be both analysed anonymously and de-identified to allow contact and recall of individuals.
Organisations
- Addenbrooke's Hospital NHS Trust, United Kingdom (Lead Research Organisation)
- National Institute for Health Research, United Kingdom (Collaboration)
- University of Cambridge (Collaboration)
- University College Hospital Ibadan (Collaboration)
- Oxford University Hospitals NHS Foundation Trust (Collaboration)
- Imperial College Healthcare NHS Trust (Collaboration)
Publications

100,000 Genomes Project Pilot Investigators
(2021)
100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report.
in The New England journal of medicine

Turro E
(2020)
Whole-genome sequencing of patients with rare diseases in a national health system.
in Nature
Description | Biorepository for Translational Medicine |
Amount | $800,000 (USD) |
Organisation | Chan Zuckerberg Initiative |
Sector | Private |
Country | United States |
Start | 02/2019 |
End | 01/2022 |
Description | Cloud-based integration of genotype and phenotype data for rare diseases research |
Amount | £400,000 (GBP) |
Funding ID | MC_PC_18030 |
Organisation | Health Data Research UK |
Sector | Private |
Country | United Kingdom |
Start | 02/2019 |
End | 11/2019 |
Description | HEALTH DATA RESEARCH HUB |
Amount | £4,795,568 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2019 |
End | 09/2022 |
Title | Cloud based integration of research data |
Description | Development of technical architecture for integration of health and genomic data that can be accessed in an Azure cloud. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Use cases are in progress. |
Title | IBD BioResource - Gut Reaction |
Description | The NIHR IBD Bioresource comprises ~30k participants with Inflammatory Bowel Disease (IBD), which include the members of the HDR UK IBD Hub. For the Gut Reaction programme, 10 NHS Trusts have been asked to provide detailed data on the participants in their Trust. Categories of data requested include: test results; prescribing; imaging; digital pathology; data from disease-specific databases and registries; and discharge summaries. While the formats and contents will vary, the hope is that this will be a much richer source of data than nationally collated datasets, like NHS Digital. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | The dataset allows researchers to check feasibility of proposed research, and inform applications for research studies using the NIHR IBD BioResource. |
URL | https://web.www.healthdatagateway.org/dataset/ae1898da-7fe5-4fe0-b10d-9f06cdec1735 |
Description | NIHR BioResource |
Organisation | Imperial College Healthcare NHS Trust |
Department | NIHR Comprehensive Biomedical Research Centre |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Cambridge BRC is lead for the collaboration involving 7 BRC/Us. |
Collaborator Contribution | Each BRC/U is establishing a local BioResource, and collectively these will provide a geographically spread cohort of volunteers recallable for research by genotype or phenotype. |
Impact | Support for investigator and industry led research studies. |
Start Year | 2012 |
Description | NIHR BioResource |
Organisation | National Institute for Health Research |
Department | Biomedical Research Centre for Mental Health and Dementia Unit (BRC/U) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Cambridge BRC is lead for the collaboration involving 7 BRC/Us. |
Collaborator Contribution | Each BRC/U is establishing a local BioResource, and collectively these will provide a geographically spread cohort of volunteers recallable for research by genotype or phenotype. |
Impact | Support for investigator and industry led research studies. |
Start Year | 2012 |
Description | NIHR BioResource |
Organisation | National Institute for Health Research |
Department | NIHR Biomedical Research Unit, University Hospitals of Leicester NHS Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | Cambridge BRC is lead for the collaboration involving 7 BRC/Us. |
Collaborator Contribution | Each BRC/U is establishing a local BioResource, and collectively these will provide a geographically spread cohort of volunteers recallable for research by genotype or phenotype. |
Impact | Support for investigator and industry led research studies. |
Start Year | 2012 |
Description | NIHR BioResource |
Organisation | National Institute for Health Research |
Department | NIHR Comprehensive Biomedical Research Centre, Guy's and St Thomas |
Country | United Kingdom |
Sector | Public |
PI Contribution | Cambridge BRC is lead for the collaboration involving 7 BRC/Us. |
Collaborator Contribution | Each BRC/U is establishing a local BioResource, and collectively these will provide a geographically spread cohort of volunteers recallable for research by genotype or phenotype. |
Impact | Support for investigator and industry led research studies. |
Start Year | 2012 |
Description | NIHR BioResource |
Organisation | Oxford University Hospitals NHS Foundation Trust |
Department | NIHR Oxford Biomedical Research Centre |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Cambridge BRC is lead for the collaboration involving 7 BRC/Us. |
Collaborator Contribution | Each BRC/U is establishing a local BioResource, and collectively these will provide a geographically spread cohort of volunteers recallable for research by genotype or phenotype. |
Impact | Support for investigator and industry led research studies. |
Start Year | 2012 |
Description | NIHR BioResource |
Organisation | University College Hospital |
Department | NIHR Comprehensive Biomedical Research Centre |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Cambridge BRC is lead for the collaboration involving 7 BRC/Us. |
Collaborator Contribution | Each BRC/U is establishing a local BioResource, and collectively these will provide a geographically spread cohort of volunteers recallable for research by genotype or phenotype. |
Impact | Support for investigator and industry led research studies. |
Start Year | 2012 |
Description | NIHR Cambridge Biomedical Research Centre |
Organisation | National Institute for Health Research |
Country | United Kingdom |
Sector | Public |
PI Contribution | Director |
Collaborator Contribution | Partnership with University of cambridge involves access to research infrastructure for biomedical research |
Impact | research outputs and impacts across 14 themes - see annual reports |
Start Year | 2007 |
Description | NIHR Cambridge Biomedical Research Centre |
Organisation | University of Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Director |
Collaborator Contribution | Partnership with University of cambridge involves access to research infrastructure for biomedical research |
Impact | research outputs and impacts across 14 themes - see annual reports |
Start Year | 2007 |