ISCF HDRUK DIH Sprint Exemplar: Cloud-based integration of phenotype and genotype data for rare disease research
Lead Research Organisation:
Cambridge University Hospitals NHS Foundation Trust
Department Name: UNLISTED
Abstract
One in 17 people have a rare disease. Rare diseases can be extremely difficult to diagnose, but they often have an unidentified genetic cause. Recent advances in clinical imaging, pathology, and genomic technologies have led to remarkable progress in understanding disease - particularly rare diseases. However, the power of these technologies cannot be fully realised until the immense volume of data generated can be integrated with NHS data, then analysed by researchers in a secure environment that protects the privacy of individuals.
Working across the NHS, academia and industry we will use existing tools to transfer data from NHS Trusts to a secure environment that interfaces with the NHS network and shares data with Public Health England. NHS information will then be combined with research data in a cloud-based platform. Initially, we will involve patients with rare diseases recruited to the NIHR BioResource; a national resource of volunteers who have already provided consent that information retrieved from their health records can be used for medical research. This will create a rich research resource with the potential to transform our understanding of rare genetic disorders, drive improvements in diagnosis and management, and provide proof of principle for use in other diseases.
Working across the NHS, academia and industry we will use existing tools to transfer data from NHS Trusts to a secure environment that interfaces with the NHS network and shares data with Public Health England. NHS information will then be combined with research data in a cloud-based platform. Initially, we will involve patients with rare diseases recruited to the NIHR BioResource; a national resource of volunteers who have already provided consent that information retrieved from their health records can be used for medical research. This will create a rich research resource with the potential to transform our understanding of rare genetic disorders, drive improvements in diagnosis and management, and provide proof of principle for use in other diseases.
Technical Summary
In depth phenotyping, including pathology and imaging technologies, and innovations in genomics, including whole genome sequencing, have led to remarkable progress in understanding rare diseases. These are commonly genetic and collectively affect around 7% of the population. Maximising the benefits of these advanced technologies requires integration of multi-dimensional data in a secure environment for analysis at scale. Using tools developed by Public Health England for national disease registration and FHIR we will create a clinical, phenotypic and genomic dataset integrated in Microsoft Azure. We will provide proof of principle that can be extended to other datasets by using data from patients with rare diseases recruited to the NIHR BioResource. These patients have consented to be contacted about academic and industry led research studies according to their genotype and phenotype, and for their data to be used in medical research. Two factors limit the power of this resource. Currently, NHS data are not routinely transferred to the research database; also, phenotypic and genomic data are held in separate databases. This proposal seeks to address these limitations by integrating data in a secure environment where they can be both analysed anonymously and de-identified to allow contact and recall of individuals.
Publications
Wei W
(2019)
Germline selection shapes human mitochondrial DNA diversity.
in Science (New York, N.Y.)
Turro E
(2020)
Whole-genome sequencing of patients with rare diseases in a national health system.
in Nature
Thaventhiran JED
(2020)
Whole-genome sequencing of a sporadic primary immunodeficiency cohort.
in Nature
Thaventhiran JED
(2020)
Publisher Correction: Whole-genome sequencing of a sporadic primary immunodeficiency cohort.
in Nature
Lorenzini T
(2020)
Characterization of the clinical and immunologic phenotype and management of 157 individuals with 56 distinct heterozygous NFKB1 mutations.
in The Journal of allergy and clinical immunology
100,000 Genomes Project Pilot Investigators
(2021)
100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report.
in The New England journal of medicine
Description | Biorepository for Translational Medicine |
Amount | $800,000 (USD) |
Organisation | Chan Zuckerberg Initiative |
Sector | Private |
Country | United States |
Start | 02/2019 |
End | 01/2022 |
Description | HEALTH DATA RESEARCH HUB |
Amount | £4,795,568 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2019 |
End | 09/2022 |
Title | Cloud based integration of research data |
Description | Development of technical architecture for integration of health and genomic data that can be accessed in an Azure cloud. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Use cases are in progress. |
Title | IBD BioResource - Gut Reaction |
Description | The NIHR IBD Bioresource comprises ~30k participants with Inflammatory Bowel Disease (IBD), which include the members of the HDR UK IBD Hub. For the Gut Reaction programme, 10 NHS Trusts have been asked to provide detailed data on the participants in their Trust. Categories of data requested include: test results; prescribing; imaging; digital pathology; data from disease-specific databases and registries; and discharge summaries. While the formats and contents will vary, the hope is that this will be a much richer source of data than nationally collated datasets, like NHS Digital. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | The dataset allows researchers to check feasibility of proposed research, and inform applications for research studies using the NIHR IBD BioResource. |
URL | https://web.www.healthdatagateway.org/dataset/ae1898da-7fe5-4fe0-b10d-9f06cdec1735 |
Description | NIHR Cambridge Biomedical Research Centre |
Organisation | National Institute for Health Research |
Country | United Kingdom |
Sector | Public |
PI Contribution | Director |
Collaborator Contribution | Partnership with University of cambridge involves access to research infrastructure for biomedical research |
Impact | research outputs and impacts across 14 themes - see annual reports |
Start Year | 2007 |
Description | NIHR Cambridge Biomedical Research Centre |
Organisation | University of Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Director |
Collaborator Contribution | Partnership with University of cambridge involves access to research infrastructure for biomedical research |
Impact | research outputs and impacts across 14 themes - see annual reports |
Start Year | 2007 |