UK Infrastructure for Large-scale Clinical Genomics Research
Lead Research Organisation:
Queen Mary University of London
Abstract
Background
The UK 100,000 Genomes Project will accelerate the application of whole genome sequencing (WGS) into routine care for the National Health Service. The genome is the genetic material of an organism (either in DNA or, for many types of viruses, in RNA). WGS provides the most comprehensive inventory of an individual's genetic variation. By incorporating this into routine care it will transform the health services people receive, changing the processes of diagnosis and management. The UK 100,000 Genomes Project seeks to drive this change by sequencing 100,000 genomes of individuals affected by rare diseases and cancer (and their families) and infectious disease pathogens.
Vision
The UK Infrastructure for Large-scale Clinical Genomics Research will provide the infrastructure which, using the information from the 100,000 Genomes project, will develop the UK as an international centre of excellences for the analysis of very large and complex biomedical datasets. As a national resource for the development of new knowledge it will provide transformative advances in the speed and range of research into the causes and consequences, prevention and treatment of disease. This proposal presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation.
Rare diseases
There are between 6,000 and 8,000 rare diseases and while each one only affects a small number of people, overall they affect the lives of 3 million people in England. Only 50% of rare diseases have an existing molecular (genetic) diagnosis. Through the scale of the 100,000 Genomes Project and by having a focus on unmet need, this infrastructure will create significant opportunities for scientific innovation, helping to assist in the interpretation of genetic findings whose clinical significance is currently unknown or uncertain.
Cancer
Cancer is, fundamentally, a genetic disorder where mutations lead to uncontrolled cell growth. The clinical impact of sequencing technologies has already enabled precise definitions of disease, uncovered insights into how cancer develops and has helped identify therapeutic targets from which to develop treatments. The importance of around 200 key genes across cancer types is known but focusing only on these alone in clinical care will not be enough to significantly impact upon the majority of individuals with cancer. Due to the scale of the 100,000 Genomes Project it offers the best opportunity to drive forward our understanding.
Pathogens and Infectious Disease.
WGS for pathogens - both viruses and bacteria is being adopted for routine management of infectious diseases, providing information on transmission and antibiotic resistance and creating tremendous opportunities for clinical research.
Output
This proposal is to fund the core hardware and software components of a data and computing infrastructure for genomics and clinical genomics research, this includes adapting existing software developed by UK partners who are international leaders in this field. The proposed infrastructure will be innovative in terms of content, technology, and scientific collaboration. It will contain clinical, health, and WGS data on large numbers of patients and pathogens in a range of key therapeutic areas. The data will be collected prospectively and will be extended with regular updates from clinical care. Available to clinicians, patients, industry and academia it will: encourage and enable engagement and collaboration in research, provide a platform for trials recruitment, and increase the depth and quality of the data obtained.
The UK 100,000 Genomes Project will accelerate the application of whole genome sequencing (WGS) into routine care for the National Health Service. The genome is the genetic material of an organism (either in DNA or, for many types of viruses, in RNA). WGS provides the most comprehensive inventory of an individual's genetic variation. By incorporating this into routine care it will transform the health services people receive, changing the processes of diagnosis and management. The UK 100,000 Genomes Project seeks to drive this change by sequencing 100,000 genomes of individuals affected by rare diseases and cancer (and their families) and infectious disease pathogens.
Vision
The UK Infrastructure for Large-scale Clinical Genomics Research will provide the infrastructure which, using the information from the 100,000 Genomes project, will develop the UK as an international centre of excellences for the analysis of very large and complex biomedical datasets. As a national resource for the development of new knowledge it will provide transformative advances in the speed and range of research into the causes and consequences, prevention and treatment of disease. This proposal presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation.
Rare diseases
There are between 6,000 and 8,000 rare diseases and while each one only affects a small number of people, overall they affect the lives of 3 million people in England. Only 50% of rare diseases have an existing molecular (genetic) diagnosis. Through the scale of the 100,000 Genomes Project and by having a focus on unmet need, this infrastructure will create significant opportunities for scientific innovation, helping to assist in the interpretation of genetic findings whose clinical significance is currently unknown or uncertain.
Cancer
Cancer is, fundamentally, a genetic disorder where mutations lead to uncontrolled cell growth. The clinical impact of sequencing technologies has already enabled precise definitions of disease, uncovered insights into how cancer develops and has helped identify therapeutic targets from which to develop treatments. The importance of around 200 key genes across cancer types is known but focusing only on these alone in clinical care will not be enough to significantly impact upon the majority of individuals with cancer. Due to the scale of the 100,000 Genomes Project it offers the best opportunity to drive forward our understanding.
Pathogens and Infectious Disease.
WGS for pathogens - both viruses and bacteria is being adopted for routine management of infectious diseases, providing information on transmission and antibiotic resistance and creating tremendous opportunities for clinical research.
Output
This proposal is to fund the core hardware and software components of a data and computing infrastructure for genomics and clinical genomics research, this includes adapting existing software developed by UK partners who are international leaders in this field. The proposed infrastructure will be innovative in terms of content, technology, and scientific collaboration. It will contain clinical, health, and WGS data on large numbers of patients and pathogens in a range of key therapeutic areas. The data will be collected prospectively and will be extended with regular updates from clinical care. Available to clinicians, patients, industry and academia it will: encourage and enable engagement and collaboration in research, provide a platform for trials recruitment, and increase the depth and quality of the data obtained.
Technical Summary
This proposal to the MRC will establish a shared, secure, high performance data and compute infrastructure as a platform for large-scale clinical genomics research based on the data flows of the UK 100,000 Genomes project.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.
Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.
The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.
The project partners have experience in infrastructure development and clinical genomics research, and will be able to re-use designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute.
A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.
Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.
The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.
The project partners have experience in infrastructure development and clinical genomics research, and will be able to re-use designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute.
A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Planned Impact
The 100,000 Genomes Project is the most ambitious and most advanced of its kind. It's aim is to accelerate the application of whole genome sequencing into routine care for NHS patients with rare diseases, cancer, and infectious diseases, transforming the processes of diagnosis and management. Similar programmes are under development in America, in the Middle East and in South Asia. MRC support for the creation of a research infrastructure, alongside the existing £50m MRC investment in the Farr Institute of Health Informatics Research, will add considerably to the quality and value of the data collected; maximizing the translational research potential of the 100,000 Genomes Project.
The programme presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation. The economic and commercial impact is likely to be large, both in terms of the potential utility to pharma and the impetus to small and medium biotech companies.
The 100,000 Genomes Project is already providing the focal point for the creation of a community for engagement, dialogue and debate with government, regulators, policy makers and the public. A key part of this dialogue in developing public trust, understanding and confidence through ongoing active engagement will be the benefits realised from the research infrastructure.
Through the Genomic England Clinical Interpretation Partnership (GECIP) it is aimed to stimulate specific dedicated programmes funded by GECIP partners (the funders) through response mode or specific calls. These will be led by GECIP researchers whom it is expected will form and lead appropriate national and international consortia to maximise the value of the dataset by maximising the understanding of this highly complex data. The impact arising from the creation of GECIP, which will be built on the proposed research infrastructure, may include, but are not limited to:
- Enhanced clinical interpretation focused on rare inherited disease, including clinically- or genomically-driven deeper phenotyping, novel approaches to interpretation and annotation, validation and functional characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Innovative clinical interpretation in cancer, including multi-omic datasets (e.g. transcriptomics, epigenetics, proteomics), analysis of circulating tumour DNA, sequential biopsy to address the genetic architecture of cancer, validation and characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Improved clinical interpretation in infectious disease, focussing upon individuals with severe outcomes in sepsis, or - in partnership with Public Health England - greater understanding of the spread of antimicrobial resistance and phylogenetic tracking of transmission across the whole of the health economy.
- Expanding the programme to include other disease areas, to address specific research questions and opportunities to develop stratified approaches. The Genomics England infrastructure will be designed to facilitate expansion and re-use, and GECIP partnership can be extended to programmes with funding outwith the 100,000 Genomes Programme.
- Health records research, such as that exemplified by the rapidly-developing capacity of the Farr Institute, can build upon and add value to the combination of clinical, laboratory, and health records data, linked to variant call data, held securely within the proposed data and compute infrastructure.
- Algorithms, models, and tools for clinical genomics research, data quality assurance, and the annotation, interpretation, and presentation of genomic, clinical, and laboratory data in combination, may be developed, evaluated, used, and shared within the proposed infrastructure.
The programme presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation. The economic and commercial impact is likely to be large, both in terms of the potential utility to pharma and the impetus to small and medium biotech companies.
The 100,000 Genomes Project is already providing the focal point for the creation of a community for engagement, dialogue and debate with government, regulators, policy makers and the public. A key part of this dialogue in developing public trust, understanding and confidence through ongoing active engagement will be the benefits realised from the research infrastructure.
Through the Genomic England Clinical Interpretation Partnership (GECIP) it is aimed to stimulate specific dedicated programmes funded by GECIP partners (the funders) through response mode or specific calls. These will be led by GECIP researchers whom it is expected will form and lead appropriate national and international consortia to maximise the value of the dataset by maximising the understanding of this highly complex data. The impact arising from the creation of GECIP, which will be built on the proposed research infrastructure, may include, but are not limited to:
- Enhanced clinical interpretation focused on rare inherited disease, including clinically- or genomically-driven deeper phenotyping, novel approaches to interpretation and annotation, validation and functional characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Innovative clinical interpretation in cancer, including multi-omic datasets (e.g. transcriptomics, epigenetics, proteomics), analysis of circulating tumour DNA, sequential biopsy to address the genetic architecture of cancer, validation and characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Improved clinical interpretation in infectious disease, focussing upon individuals with severe outcomes in sepsis, or - in partnership with Public Health England - greater understanding of the spread of antimicrobial resistance and phylogenetic tracking of transmission across the whole of the health economy.
- Expanding the programme to include other disease areas, to address specific research questions and opportunities to develop stratified approaches. The Genomics England infrastructure will be designed to facilitate expansion and re-use, and GECIP partnership can be extended to programmes with funding outwith the 100,000 Genomes Programme.
- Health records research, such as that exemplified by the rapidly-developing capacity of the Farr Institute, can build upon and add value to the combination of clinical, laboratory, and health records data, linked to variant call data, held securely within the proposed data and compute infrastructure.
- Algorithms, models, and tools for clinical genomics research, data quality assurance, and the annotation, interpretation, and presentation of genomic, clinical, and laboratory data in combination, may be developed, evaluated, used, and shared within the proposed infrastructure.
People |
ORCID iD |
Publications
100,000 Genomes Project Pilot Investigators
(2021)
100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report.
in The New England journal of medicine
Akinkuolie AO
(2019)
Group IIA Secretory Phospholipase A2, Vascular Inflammation, and Incident Cardiovascular Disease.
in Arteriosclerosis, thrombosis, and vascular biology
Al-Jawahiri R
(2022)
SOX11 variants cause a neurodevelopmental disorder with infrequent ocular malformations and hypogonadotropic hypogonadism and with distinct DNA methylation profile.
in Genetics in medicine : official journal of the American College of Medical Genetics
Arno G
(2017)
Biallelic Mutation of ARHGEF18, Involved in the Determination of Epithelial Apicobasal Polarity, Causes Adult-Onset Retinal Degeneration.
in American journal of human genetics
Bacq A
(2022)
Cardiac Investigations in Sudden Unexpected Death in DEPDC5-Related Epilepsy.
in Annals of neurology
Balachandar S
(2021)
Identification and validation of a novel pathogenic variant in GDF2 ( BMP9 ) responsible for hereditary hemorrhagic telangiectasia and pulmonary arteriovenous malformations
in American Journal of Medical Genetics Part A
Bick D
(2021)
An online compendium of treatable genetic disorders.
in American journal of medical genetics. Part C, Seminars in medical genetics
Cabrera CP
(2019)
Over 1000 genetic loci influencing blood pressure with multiple systems and tissues implicated.
in Human molecular genetics
Cheloor Kovilakam S
(2023)
Prevalence and significance of DDX41 gene variants in the general population.
in Blood
Chen Z
(2020)
Neuronal intranuclear inclusion disease is genetically heterogeneous
in Annals of Clinical and Translational Neurology
Choi DJ
(2023)
The genomic landscape of familial glioma.
in Science advances
Cipriani V
(2023)
Rare disease gene association discovery from burden analysis of the 100,000 Genomes Project data.
in medRxiv : the preprint server for health sciences
Clark DW
(2019)
Associations of autozygosity with a broad range of human phenotypes.
in Nature communications
Claus LR
(2023)
Certain heterozygous variants in the kinase domain of the serine/threonine kinase NEK8 can cause an autosomal dominant form of polycystic kidney disease.
in Kidney international
Collier DJ
(2024)
Personalized Antihypertensive Treatment Optimization With Smartphone-Enabled Remote Precision Dosing of Amlodipine During the COVID-19 Pandemic (PERSONAL-CovidBP Trial).
in Journal of the American Heart Association
COVID-19 Host Genetics Initiative
(2023)
A second update on mapping the human genetic architecture of COVID-19.
in Nature
COVID-19 Host Genetics Initiative
(2022)
A first update on mapping the human genetic architecture of COVID-19.
in Nature
Czesnikiewicz-Guzik M
(2019)
Causal association between periodontitis and hypertension: evidence from Mendelian randomization and a randomized controlled trial of non-surgical periodontal therapy.
in European heart journal
Dominik N
(2023)
Normal and pathogenic variation of RFC1 repeat expansions: implications for clinical diagnosis.
in Brain : a journal of neurology
Gallo JE
(2020)
Hypertension and the roles of the 9p21.3 risk locus: Classic findings and new association data.
in International Journal of Cardiology. Hypertension
Graham SE
(2023)
Author Correction: The power of genetic diversity in genome-wide association studies of lipids.
in Nature
Harshfield E
(2020)
The role of haematological traits in risk of ischaemic stroke and its subtypes
Harshfield EL
(2020)
The role of haematological traits in risk of ischaemic stroke and its subtypes.
in Brain : a journal of neurology
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Genomics England Newborn screening funded to £100m |
Geographic Reach | National |
Policy Influence Type | Contribution to new or Improved professional practice |
Impact | No impact yet as service only just agreed to be funded. |
URL | https://www.genomicsengland.co.uk/initiatives/newborns |
Description | UK Clinical Genomics Infrastructure: Co-lead for the preparation for commissioning in the NHS of a National Genomic Health service |
Geographic Reach | National |
Policy Influence Type | Membership of a guideline committee |
Description | UK Clinical Genomics Infrastructure: Member of the Topol Review of Digital, Genomics and Artificial Intelligence implications for workforce planning. |
Geographic Reach | National |
Policy Influence Type | Membership of a guideline committee |
Description | COVID-19 (with CCO) |
Amount | £5,000,000 (GBP) |
Organisation | LifeArc |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2020 |
End | 03/2021 |
Description | COVID-19 Matched WGS |
Amount | £9,890,000 (GBP) |
Organisation | Illumina |
Sector | Private |
Country | United States |
Start | 03/2020 |
End | 03/2021 |
Description | COVID-19 WGS |
Amount | £3,000,000 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2020 |
End | 03/2021 |
Description | Illumina matched-funds re UK Life Sciences Cancer WGS (Genomics England) |
Amount | £2,250,000 (GBP) |
Organisation | Illumina |
Sector | Private |
Country | United States |
Start | 03/2020 |
End | 03/2022 |
Description | Inward Capital co-investment and 100 science jobs at Illumina |
Amount | £22,000,000 (GBP) |
Organisation | Illumina Inc. |
Sector | Private |
Country | United States |
Start | 03/2020 |
End | 03/2025 |
Description | Long-Read Cancer Sequencing |
Amount | £162,000 (GBP) |
Organisation | Oxford Nanopore Technologies |
Sector | Private |
Country | United Kingdom |
Start | 03/2020 |
End | 03/2022 |
Description | REACT-GE (COVID controls) |
Amount | £1,500,000 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2020 |
End | 03/2021 |
Description | UK Clinical Genomics Research Data Infrastructure |
Amount | £2,700,000 (GBP) |
Organisation | Department of Health (DH) |
Sector | Public |
Country | United Kingdom |
Start | 03/2018 |
End | 03/2021 |
Description | UK Life Sciences Cancer WGS (Genomics England) |
Amount | £7,870,000 (GBP) |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 03/2020 |
End | 03/2022 |
Title | The UK Clinical Genomics Infrastructure: Clinical Data |
Description | Improvement to the wider UK Clinical Genomics Infrastructure |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | The infrastructure now holds 1.6 billion data points on 94,000 participants and 91,000 whole genomes and recently cancer registry and mortality data (2141 participants with cause of death). |
URL | https://www.genomicsengland.co.uk/ |
Description | Genome Wide Association Study of Lacunar Stroke |
Organisation | University of Cambridge |
Department | Department of Physiology, Development and Neuroscience |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Statistical analysis |
Collaborator Contribution | Data collection, oversight |
Impact | Manuscript in press at Lancet Neurology |
Start Year | 2019 |
Description | Immune Mechanisms in Small Vessel Disease |
Organisation | Ludwig Maximilian University of Munich (LMU Munich) |
Country | Germany |
Sector | Academic/University |
PI Contribution | Study design, Primary analysis, study oversight |
Collaborator Contribution | Statistical analysis and interpretation |
Impact | Manuscript under review to BRAIN |
Start Year | 2020 |
Description | REACT Long-Covid |
Organisation | Imperial College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Co-PI for this study funded by NIHR. |
Collaborator Contribution | Imperial College lead the study. |
Impact | No impact yet. |
Start Year | 2021 |
Description | REACT-GE Covid Study |
Organisation | Imperial College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Member of the Research Delivery Steering Committee |
Collaborator Contribution | Member of the Research Delivery Steering Committee |
Impact | As per study website: https://www.imperial.ac.uk/medicine/research-and-impact/groups/react-study/ |
Start Year | 2020 |
Description | Genomics England - 100k Genome Project (Multiple National & International Talks 2015-2018) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Multiple talks about the 100k Genomes project as GEL Chief Scientist |
Year(s) Of Engagement Activity | 2015,2016,2017,2018 |
URL | http://www.genomicsengland.co.uk |
Description | Range of Genomics related talks |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | The future of genomics in the delivery of healthcare |
Year(s) Of Engagement Activity | 2021,2022 |
Description | The Genomics Conversation |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Activities: •Events took place all over the country and a big thank you goes to the teams in Lincoln, York and Nottingham for their awareness raising activities. •We launched our new nursing video - 'Nursing in the Genomic Era' •We hosted our fifth WeNurses chat on 'Nursing and Ethics in the Genomic Era' •Ran a competition using the Genomics Game to conclude the week. •Four #GenomicsConversation podcasts were launched on SoundCloud to introduce nurses and midwives to genomics. •We held our first ever #GenomicsConversation Thunderclap to launch the weeks activities. •We organised a social media pledge campaign with enthusiasts spreading the message far and wide on social media. Engagement: •During the course of the week the website received over 12,000 page views. •Our first ever Thunderclap was a great success delivering a huge social reach with influential supporters from nursing including WeNurses, AgencyNurse and 6CsLive!. •Our four #GenomicsConversation podcasts were streamed over 80 times during week. •We received over 1000 views of our videos. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.genomicseducation.hee.nhs.uk/woa-18/ |