UK Infrastructure for Large-scale Clinical Genomics Research

Lead Research Organisation: Queen Mary, University of London
Department Name: William Harvey Research Institute

Abstract

Background
The UK 100,000 Genomes Project will accelerate the application of whole genome sequencing (WGS) into routine care for the National Health Service. The genome is the genetic material of an organism (either in DNA or, for many types of viruses, in RNA). WGS provides the most comprehensive inventory of an individual's genetic variation. By incorporating this into routine care it will transform the health services people receive, changing the processes of diagnosis and management. The UK 100,000 Genomes Project seeks to drive this change by sequencing 100,000 genomes of individuals affected by rare diseases and cancer (and their families) and infectious disease pathogens.

Vision
The UK Infrastructure for Large-scale Clinical Genomics Research will provide the infrastructure which, using the information from the 100,000 Genomes project, will develop the UK as an international centre of excellences for the analysis of very large and complex biomedical datasets. As a national resource for the development of new knowledge it will provide transformative advances in the speed and range of research into the causes and consequences, prevention and treatment of disease. This proposal presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation.

Rare diseases
There are between 6,000 and 8,000 rare diseases and while each one only affects a small number of people, overall they affect the lives of 3 million people in England. Only 50% of rare diseases have an existing molecular (genetic) diagnosis. Through the scale of the 100,000 Genomes Project and by having a focus on unmet need, this infrastructure will create significant opportunities for scientific innovation, helping to assist in the interpretation of genetic findings whose clinical significance is currently unknown or uncertain.

Cancer
Cancer is, fundamentally, a genetic disorder where mutations lead to uncontrolled cell growth. The clinical impact of sequencing technologies has already enabled precise definitions of disease, uncovered insights into how cancer develops and has helped identify therapeutic targets from which to develop treatments. The importance of around 200 key genes across cancer types is known but focusing only on these alone in clinical care will not be enough to significantly impact upon the majority of individuals with cancer. Due to the scale of the 100,000 Genomes Project it offers the best opportunity to drive forward our understanding.

Pathogens and Infectious Disease.
WGS for pathogens - both viruses and bacteria is being adopted for routine management of infectious diseases, providing information on transmission and antibiotic resistance and creating tremendous opportunities for clinical research.

Output
This proposal is to fund the core hardware and software components of a data and computing infrastructure for genomics and clinical genomics research, this includes adapting existing software developed by UK partners who are international leaders in this field. The proposed infrastructure will be innovative in terms of content, technology, and scientific collaboration. It will contain clinical, health, and WGS data on large numbers of patients and pathogens in a range of key therapeutic areas. The data will be collected prospectively and will be extended with regular updates from clinical care. Available to clinicians, patients, industry and academia it will: encourage and enable engagement and collaboration in research, provide a platform for trials recruitment, and increase the depth and quality of the data obtained.

Technical Summary

This proposal to the MRC will establish a shared, secure, high performance data and compute infrastructure as a platform for large-scale clinical genomics research based on the data flows of the UK 100,000 Genomes project.

Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.

Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.

The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.

The project partners have experience in infrastructure development and clinical genomics research, and will be able to re-use designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute.

A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.

Planned Impact

The 100,000 Genomes Project is the most ambitious and most advanced of its kind. It's aim is to accelerate the application of whole genome sequencing into routine care for NHS patients with rare diseases, cancer, and infectious diseases, transforming the processes of diagnosis and management. Similar programmes are under development in America, in the Middle East and in South Asia. MRC support for the creation of a research infrastructure, alongside the existing £50m MRC investment in the Farr Institute of Health Informatics Research, will add considerably to the quality and value of the data collected; maximizing the translational research potential of the 100,000 Genomes Project.

The programme presents a unique opportunity for UK clinical research that will enable the discovery of new diagnostics, test complex approaches to stratified medicine, and drive therapeutic innovation. The economic and commercial impact is likely to be large, both in terms of the potential utility to pharma and the impetus to small and medium biotech companies.

The 100,000 Genomes Project is already providing the focal point for the creation of a community for engagement, dialogue and debate with government, regulators, policy makers and the public. A key part of this dialogue in developing public trust, understanding and confidence through ongoing active engagement will be the benefits realised from the research infrastructure.

Through the Genomic England Clinical Interpretation Partnership (GECIP) it is aimed to stimulate specific dedicated programmes funded by GECIP partners (the funders) through response mode or specific calls. These will be led by GECIP researchers whom it is expected will form and lead appropriate national and international consortia to maximise the value of the dataset by maximising the understanding of this highly complex data. The impact arising from the creation of GECIP, which will be built on the proposed research infrastructure, may include, but arenot limited to:

- Enhanced clinical interpretation focused on rare inherited disease, including clinically- or genomically-driven deeper phenotyping, novel approaches to interpretation and annotation, validation and functional characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Innovative clinical interpretation in cancer, including multi-omic datasets (e.g. transcriptomics, epigenetics, proteomics), analysis of circulating tumour DNA, sequential biopsy to address the genetic architecture of cancer, validation and characterisation of variants, identification of novel therapeutic targets, or repurposing of existing therapies.
- Improved clinical interpretation in infectious disease, focussing upon individuals with severe outcomes in sepsis, or - in partnership with Public Health England - greater understanding of the spread of antimicrobial resistance and phylogenetic tracking of transmission across the whole of the health economy.
- Expanding the programme to include other disease areas, to address specific research questions and opportunities to develop stratified approaches. The Genomics England infrastructure will be designed to facilitate expansion and re-use, and GECIP partnership can be extended to programmes with funding outwith the 100,000 Genomes Programme.
- Health records research, such as that exemplified by the rapidly-developing capacity of the Farr Institute, can build upon and add value to the combination of clinical, laboratory, and health records data, linked to variant call data, held securely within the proposed data and compute infrastructure.
- Algorithms, models, and tools for clinical genomics research, data quality assurance, and the annotation, interpretation, and presentation of genomic, clinical, and laboratory data in combination, may be developed, evaluated, used, and shared within the proposed infrastructure.

Publications

10 25 50
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Chief Scientist Genomics England
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL http://www.genomicsengland.co.uk
 
Description Transformation of the NHS Genomic Medicine Service
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
Impact I co-chaired the NHS Transition groups in rare disease cancer and pharmaco-genomics which has created a national test directory adjusting 750k genomic tests in the NHS to offer, for the first time, an equitable NHS genomics medicine service across England from Oct '18.
URL http://www.genomicsengland.co.uk
 
Description UK Clinical Genomics Infrastructure: Co-lead for the preparation for commissioning in the NHS of a National Genomic Health service
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description UK Clinical Genomics Infrastructure: Member of the Topol Review of Digital, Genomics and Artificial Intelligence implications for workforce planning.
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description UK Clinical Genomics Research Data Infrastructure
Amount £2,700,000 (GBP)
Organisation Department of Health (DH) 
Sector Public
Country United Kingdom
Start 04/2018 
End 03/2021
 
Title Genotyping technology 
Description Taqman genotyping is a main workhorse for SNP genotyping. We adapted a methodology for reaction miniturisation from KBioscience for nanolitre reaction volumes reducing the cost of genotyping by 50% 
Type Of Material Technology assay or reagent 
Year Produced 2006 
Provided To Others? Yes  
Impact added value for funders 
 
Title High throughput genotyping and sequencing hub 
Description Barts and The London Genome Centre. Offers high throughput genomics infrastructure to internal and external users including hotel facilities for scientists. 
Type Of Material Improvements to research infrastructure 
Year Produced 2008 
Provided To Others? Yes  
Impact Multiple major publications in common disease. 
 
Title Improved techniques 
Description The sampling handling approaches and standard operating procedures for phenotyping been used to develop the automated Biobank sample handling system 
Type Of Material Improvements to research infrastructure 
Year Produced 2006 
Provided To Others? Yes  
Impact Handling 500000 samples now for UK Biobank 
 
Title NHS Genomic Medicine Centres 
Description I created and established the concept of NHS Genomic Medicine Centres in England which has led to NHS England Commissioning this capacity and capability framework for the 100,000 Genomes Project 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact Led to NHS England Commissioning this capacity and capability framework for the 100,000 Genomes Project 
 
Title New capital building 
Description At Barts and The London a new Heart Centre for translating gene discovery into novel therapies (3172 metres squared). Plus faculty a 24.7M sterling project 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact The project is ongoing 
 
Title Phanotypic and genotypic database 
Description Initially a microsoft access relational database which we migrated to MySQL database holding all phenotypic and genotypic data for analysis and ease of collaboration. Several other studies have copied or been helped to adapt our approach 
Type Of Material Biological samples 
Year Produced 2007 
Provided To Others? Yes  
Impact Others have adopted the database structure for similar phenotypic collections 
 
Title The Genomics England Clinical Interpretation Partnership 
Description We have established, launched and called for expressions of interest to the UK NHS, academics and training to form domains to enhance clinical interpretation of the data from the 100,000 genomes project. 
Type Of Material Improvements to research infrastructure 
Year Produced 2014 
Provided To Others? Yes  
Impact Receiving expressions of interest for forming GeCIPs from research community 
 
Title The UK Clinical Genomics Infrastructure: Clinical Data 
Description Improvement to the wider UK Clinical Genomics Infrastructure 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact The infrastructure now holds 1.6 billion data points on 94,000 participants and 91,000 whole genomes and recently cancer registry and mortality data (2141 participants with cause of death). 
URL https://www.genomicsengland.co.uk/
 
Title UK Clinical Genomics Research Infrastructure 
Description Data Centre 
Type Of Material Improvements to research infrastructure 
Year Produced 2014 
Provided To Others? Yes  
Impact The research data centre for analysis and interpretation of the 100,000 Genomes Project 
 
Description Genomics England 
Organisation Department of Health Social Services and Public Safety (DHSSPS)
Country United Kingdom 
Sector Public 
PI Contribution I have been Chief Scientist for the 100000 whole genome sequencing programme since 2013. I led and created the consortium that won the grant that creates this data centre for the research component of the 100,000 genomes project. This goes live for the main programme imminently (see further funding)
Collaborator Contribution We are completing pilots in rare disease and cancer
Impact We have: - returned diagnoses to the NHS - created 13 NHS Genomic Medicine centres across England that serve to enrol, supply clinical data, validate feedback to patients - embarked on the main programme - formed a 12 company consortium to create academic NHS Industry partnerships - 9 HE Institutes now offer a Master's in Genomic Medicine
Start Year 2013
 
Description Quintiles Prime Site 
Organisation Quintiles Transnational Corporation
Country United States 
Sector Private 
PI Contribution I lead the World's first Prime Site which concentrates trials in a single site management organisation at Barts Health and Queen Mary University of London.
Collaborator Contribution Our collaboration with Quintiles, now extended across UCLP, created a world-leading trials hub bringing 168 new trials to 3356 UK patients (£ 20m) and leading to creation of 25 similar "Prime Sites" worldwide.
Impact It ranges across all therapeutic areas
Start Year 2008
 
Description Chief Scientist for Genomics England - Progress Educational Trust 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Public meeting on the 100,00 Genomes Project, evoked discussions on the programme and data handling

Engagement from patient community
Year(s) Of Engagement Activity 2014
 
Description Chief Scientist for Genomics England Town Hall meetings 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Presented and co-led 3 of these meetings. Meetings sparked interesting and lively debates on the 100,000 genome project and what it means for patients

Project picked up by social media
Further events planned
Year(s) Of Engagement Activity 2014
 
Description Genomics England - 100k Genome Project (Multiple National & International Talks 2015-2018) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Multiple talks about the 100k Genomes project as GEL Chief Scientist
Year(s) Of Engagement Activity 2015,2016,2017,2018
URL http://www.genomicsengland.co.uk
 
Description The Genomics Conversation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Activities:
•Events took place all over the country and a big thank you goes to the teams in Lincoln, York and Nottingham for their awareness raising activities.
•We launched our new nursing video - 'Nursing in the Genomic Era'
•We hosted our fifth WeNurses chat on 'Nursing and Ethics in the Genomic Era'
•Ran a competition using the Genomics Game to conclude the week.
•Four #GenomicsConversation podcasts were launched on SoundCloud to introduce nurses and midwives to genomics.
•We held our first ever #GenomicsConversation Thunderclap to launch the weeks activities.
•We organised a social media pledge campaign with enthusiasts spreading the message far and wide on social media.

Engagement:
•During the course of the week the website received over 12,000 page views.
•Our first ever Thunderclap was a great success delivering a huge social reach with influential supporters from nursing including WeNurses, AgencyNurse and 6CsLive!.
•Our four #GenomicsConversation podcasts were streamed over 80 times during week.
•We received over 1000 views of our videos.
Year(s) Of Engagement Activity 2018
URL https://www.genomicseducation.hee.nhs.uk/woa-18/