UK Infrastructure for Large-scale Clinical Genomics Research

Lead Research Organisation: Queen Mary University of London
Department Name: William Harvey Research Institute

Abstract

Background
The UK 100,000 Genomes Project will accelerate the application of whole genome sequencing (WGS) into routine care for the National Health Service. The genome is the genetic material of an organism (either in DNA or, for many types of viruses, in RNA). WGS provides the most comprehensive inventory of an individual's genetic variation. By incorporating this into routine care it will transform the health services people receive, changing the processes of diagnosis and management. The UK 100,000 Genomes Project seeks to drive this change by sequencing 100,000 genomes of individuals affected by rare diseases and cancer (and their families) and infectious disease pathogens.

Vision
The UK Infrastructure for Large-scale Clinical Genomics Research will provide the infrastructure which, using the information from the 100,000 Genomes project, will develop the UK as an international centre of excellences for the analysis of very large and complex biomedical datasets. As a national resource for the development of new knowledge it will provide transformative advances in the speed and range of research into the causes and consequences, prevention and treatment of disease. This proposal presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation.

Rare diseases
There are between 6,000 and 8,000 rare diseases and while each one only affects a small number of people, overall they affect the lives of 3 million people in England. Only 50% of rare diseases have an existing molecular (genetic) diagnosis. Through the scale of the 100,000 Genomes Project and by having a focus on unmet need, this infrastructure will create significant opportunities for scientific innovation, helping to assist in the interpretation of genetic findings whose clinical significance is currently unknown or uncertain.

Cancer
Cancer is, fundamentally, a genetic disorder where mutations lead to uncontrolled cell growth. The clinical impact of sequencing technologies has already enabled precise definitions of disease, uncovered insights into how cancer develops and has helped identify therapeutic targets from which to develop treatments. The importance of around 200 key genes across cancer types is known but focusing only on these alone in clinical care will not be enough to significantly impact upon the majority of individuals with cancer. Due to the scale of the 100,000 Genomes Project it offers the best opportunity to drive forward our understanding.

Pathogens and Infectious Disease.
WGS for pathogens - both viruses and bacteria is being adopted for routine management of infectious diseases, providing information on transmission and antibiotic resistance and creating tremendous opportunities for clinical research.

Output
This proposal is to fund the core hardware and software components of a data and computing infrastructure for genomics and clinical genomics research, this includes adapting existing software developed by UK partners who are international leaders in this field. The proposed infrastructure will be innovative in terms of content, technology, and scientific collaboration. It will contain clinical, health, and WGS data on large numbers of patients and pathogens in a range of key therapeutic areas. The data will be collected prospectively and will be extended with regular updates from clinical care. Available to clinicians, patients, industry and academia it will: encourage and enable engagement and collaboration in research, provide a platform for trials recruitment, and increase the depth and quality of the data obtained.

Technical Summary

This proposal to the MRC will establish a shared, secure, high performance data and compute infrastructure as a platform for large-scale clinical genomics research based on the data flows of the UK 100,000 Genomes project.

Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.

Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.

The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.

The project partners have experience in infrastructure development and clinical genomics research, and will be able to re-use designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute.

A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.

Planned Impact

The 100,000 Genomes Project is the most ambitious and most advanced of its kind. It's aim is to accelerate the application of whole genome sequencing into routine care for NHS patients with rare diseases, cancer, and infectious diseases, transforming the processes of diagnosis and management. Similar programmes are under development in America, in the Middle East and in South Asia. MRC support for the creation of a research infrastructure, alongside the existing £50m MRC investment in the Farr Institute of Health Informatics Research, will add considerably to the quality and value of the data collected; maximizing the translational research potential of the 100,000 Genomes Project.

The programme presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation. The economic and commercial impact is likely to be large, both in terms of the potential utility to pharma and the impetus to small and medium biotech companies.

The 100,000 Genomes Project is already providing the focal point for the creation of a community for engagement, dialogue and debate with government, regulators, policy makers and the public. A key part of this dialogue in developing public trust, understanding and confidence through ongoing active engagement will be the benefits realised from the research infrastructure.

Through the Genomic England Clinical Interpretation Partnership (GECIP) it is aimed to stimulate specific dedicated programmes funded by GECIP partners (the funders) through response mode or specific calls. These will be led by GECIP researchers whom it is expected will form and lead appropriate national and international consortia to maximise the value of the dataset by maximising the understanding of this highly complex data. The impact arising from the creation of GECIP, which will be built on the proposed research infrastructure, may include, but are not limited to:

- Enhanced clinical interpretation focused on rare inherited disease, including clinically- or genomically-driven deeper phenotyping, novel approaches to interpretation and annotation, validation and functional characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Innovative clinical interpretation in cancer, including multi-omic datasets (e.g. transcriptomics, epigenetics, proteomics), analysis of circulating tumour DNA, sequential biopsy to address the genetic architecture of cancer, validation and characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Improved clinical interpretation in infectious disease, focussing upon individuals with severe outcomes in sepsis, or - in partnership with Public Health England - greater understanding of the spread of antimicrobial resistance and phylogenetic tracking of transmission across the whole of the health economy.
- Expanding the programme to include other disease areas, to address specific research questions and opportunities to develop stratified approaches. The Genomics England infrastructure will be designed to facilitate expansion and re-use, and GECIP partnership can be extended to programmes with funding outwith the 100,000 Genomes Programme.
- Health records research, such as that exemplified by the rapidly-developing capacity of the Farr Institute, can build upon and add value to the combination of clinical, laboratory, and health records data, linked to variant call data, held securely within the proposed data and compute infrastructure.
- Algorithms, models, and tools for clinical genomics research, data quality assurance, and the annotation, interpretation, and presentation of genomic, clinical, and laboratory data in combination, may be developed, evaluated, used, and shared within the proposed infrastructure.

Publications

10 25 50
publication icon
100,000 Genomes Project Pilot Investigators (2021) 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. in The New England journal of medicine

publication icon
Akinkuolie AO (2019) Group IIA Secretory Phospholipase A2, Vascular Inflammation, and Incident Cardiovascular Disease. in Arteriosclerosis, thrombosis, and vascular biology

publication icon
Bick D (2021) An online compendium of treatable genetic disorders. in American journal of medical genetics. Part C, Seminars in medical genetics

 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Genomics England Newborn screening funded to £100m
Geographic Reach National 
Policy Influence Type Contribution to new or Improved professional practice
Impact No impact yet as service only just agreed to be funded.
URL https://www.genomicsengland.co.uk/initiatives/newborns
 
Description UK Clinical Genomics Infrastructure: Co-lead for the preparation for commissioning in the NHS of a National Genomic Health service
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description UK Clinical Genomics Infrastructure: Member of the Topol Review of Digital, Genomics and Artificial Intelligence implications for workforce planning.
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description COVID-19 (with CCO)
Amount £5,000,000 (GBP)
Organisation LifeArc 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2020 
End 03/2021
 
Description COVID-19 Matched WGS
Amount £9,890,000 (GBP)
Organisation Illumina 
Sector Private
Country United States
Start 04/2020 
End 03/2021
 
Description COVID-19 WGS
Amount £3,000,000 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 04/2020 
End 03/2021
 
Description Illumina matched-funds re UK Life Sciences Cancer WGS (Genomics England)
Amount £2,250,000 (GBP)
Organisation Illumina 
Sector Private
Country United States
Start 04/2020 
End 03/2022
 
Description Inward Capital co-investment and 100 science jobs at Illumina
Amount £22,000,000 (GBP)
Organisation Illumina Inc. 
Sector Private
Country United States
Start 04/2020 
End 03/2025
 
Description Long-Read Cancer Sequencing
Amount £162,000 (GBP)
Organisation Oxford Nanopore Technologies 
Sector Private
Country United Kingdom
Start 04/2020 
End 03/2022
 
Description REACT-GE (COVID controls)
Amount £1,500,000 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 04/2020 
End 03/2021
 
Description UK Clinical Genomics Research Data Infrastructure
Amount £2,700,000 (GBP)
Organisation Department of Health (DH) 
Sector Public
Country United Kingdom
Start 04/2018 
End 03/2021
 
Description UK Life Sciences Cancer WGS (Genomics England)
Amount £7,870,000 (GBP)
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 04/2020 
End 03/2022
 
Title The UK Clinical Genomics Infrastructure: Clinical Data 
Description Improvement to the wider UK Clinical Genomics Infrastructure 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact The infrastructure now holds 1.6 billion data points on 94,000 participants and 91,000 whole genomes and recently cancer registry and mortality data (2141 participants with cause of death). 
URL https://www.genomicsengland.co.uk/
 
Description Genome Wide Association Study of Lacunar Stroke 
Organisation University of Cambridge
Department Department of Physiology, Development and Neuroscience
Country United Kingdom 
Sector Academic/University 
PI Contribution Statistical analysis
Collaborator Contribution Data collection, oversight
Impact Manuscript in press at Lancet Neurology
Start Year 2019
 
Description Immune Mechanisms in Small Vessel Disease 
Organisation Ludwig Maximilian University of Munich (LMU Munich)
Country Germany 
Sector Academic/University 
PI Contribution Study design, Primary analysis, study oversight
Collaborator Contribution Statistical analysis and interpretation
Impact Manuscript under review to BRAIN
Start Year 2020
 
Description REACT Long-Covid 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Co-PI for this study funded by NIHR.
Collaborator Contribution Imperial College lead the study.
Impact No impact yet.
Start Year 2021
 
Description REACT-GE Covid Study 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Member of the Research Delivery Steering Committee
Collaborator Contribution Member of the Research Delivery Steering Committee
Impact As per study website: https://www.imperial.ac.uk/medicine/research-and-impact/groups/react-study/
Start Year 2020
 
Description Genomics England - 100k Genome Project (Multiple National & International Talks 2015-2018) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Multiple talks about the 100k Genomes project as GEL Chief Scientist
Year(s) Of Engagement Activity 2015,2016,2017,2018
URL http://www.genomicsengland.co.uk
 
Description Range of Genomics related talks 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The future of genomics in the delivery of healthcare
Year(s) Of Engagement Activity 2021,2022
 
Description The Genomics Conversation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Activities:
•Events took place all over the country and a big thank you goes to the teams in Lincoln, York and Nottingham for their awareness raising activities.
•We launched our new nursing video - 'Nursing in the Genomic Era'
•We hosted our fifth WeNurses chat on 'Nursing and Ethics in the Genomic Era'
•Ran a competition using the Genomics Game to conclude the week.
•Four #GenomicsConversation podcasts were launched on SoundCloud to introduce nurses and midwives to genomics.
•We held our first ever #GenomicsConversation Thunderclap to launch the weeks activities.
•We organised a social media pledge campaign with enthusiasts spreading the message far and wide on social media.

Engagement:
•During the course of the week the website received over 12,000 page views.
•Our first ever Thunderclap was a great success delivering a huge social reach with influential supporters from nursing including WeNurses, AgencyNurse and 6CsLive!.
•Our four #GenomicsConversation podcasts were streamed over 80 times during week.
•We received over 1000 views of our videos.
Year(s) Of Engagement Activity 2018
URL https://www.genomicseducation.hee.nhs.uk/woa-18/