UK Infrastructure for Large-scale Clinical Genomics Research
Lead Research Organisation:
Queen Mary University of London
Department Name: UNLISTED
Abstract
This proposal to the MRC will establish a shared, secure, high performance data and compute infrastructure as a platform for large-scale clinical genomics research based on the data flows of the UK 100,000 Genomes project.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.
Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.
The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.
The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK
Biobank and the European Bioinformatics Institute.
A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.
Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.
The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.
The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK
Biobank and the European Bioinformatics Institute.
A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Technical Summary
This proposal to the MRC will establish a shared, secure, high performance data and compute infrastructure as a platform for large-scale clinical genomics research based on the data flows of the UK 100,000 Genomes project.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data. Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment. The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research. The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute. A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data. Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment. The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research. The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute. A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Organisations
- Queen Mary University of London (Lead Research Organisation)
- Department of Health Social Services and Public Safety (DHSSPS) (Collaboration)
- Quintiles Transnational Corporation (Collaboration)
- Ludwig Maximilian University of Munich (LMU Munich) (Collaboration)
- UNIVERSITY OF CAMBRIDGE (Collaboration)
Publications
Xiao S
(2023)
Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA.
in American journal of human genetics
Wu H
(2018)
SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research.
in Journal of the American Medical Informatics Association : JAMIA
Wei W
(2020)
Nuclear-mitochondrial DNA segments resemble paternally inherited mitochondrial DNA in humans.
in Nature communications
Wei W
(2019)
Germline selection shapes human mitochondrial DNA diversity.
in Science (New York, N.Y.)
Wei W
(2020)
Author Correction: Nuclear-mitochondrial DNA segments resemble paternally inherited mitochondrial DNA in humans.
in Nature communications
Van Den Berg ME
(2017)
Discovery of novel heart rate-associated loci using the Exome Chip.
in Human molecular genetics
Turro E
(2020)
Whole-genome sequencing of patients with rare diseases in a national health system.
in Nature
Trotman J
(2022)
The NHS England 100,000 Genomes Project: feasibility and utility of centralised genome sequencing for children with cancer.
in British journal of cancer
Traylor M
(2020)
Influence of Genetic Variation in PDE3A on Endothelial Function and Stroke.
in Hypertension (Dallas, Tex. : 1979)
Tamargo J
(2024)
New pharmacological agents and novel cardiovascular pharmacotherapy strategies in 2023
in European Heart Journal - Cardiovascular Pharmacotherapy
Surendran P
(2021)
Publisher Correction: Discovery of rare variants associated with blood pressure regulation through meta-analysis of 1.3 million individuals.
in Nature genetics
Sung YJ
(2019)
A multi-ancestry genome-wide study incorporating gene-smoking interactions identifies multiple new loci for pulse pressure and mean arterial pressure.
in Human molecular genetics
Steinthorsdottir V
(2020)
Genetic predisposition to hypertension is associated with preeclampsia in European and Central Asian women
in Nature Communications
Stark Z
(2021)
Scaling national and international improvement in virtual gene panel curation via a collaborative approach to discordance resolution.
in American journal of human genetics
Spielmann N
(2022)
Extensive identification of genes involved in congenital and structural heart disorders and cardiomyopathy
in Nature Cardiovascular Research
Spielmann N
(2022)
Publisher Correction: Extensive identification of genes involved in congenital and structural heart disorders and cardiomyopathy
in Nature Cardiovascular Research
Sosinsky A
(2024)
Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 Genomes Cancer Programme.
in Nature medicine
Silvennoinen K
(2021)
Late diagnoses of Dravet syndrome: How many individuals are we missing?
in Epilepsia open
Siedlinski M
(2023)
Genetic analyses identify brain structures related to cognitive impairment associated with elevated blood pressure.
in European heart journal
Shoemark A
(2022)
Genome sequencing reveals underdiagnosis of primary ciliary dyskinesia in bronchiectasis.
in The European respiratory journal
Scott RH
(2019)
Genomic medicine: time for health-care transformation.
in Lancet (London, England)
Schon KR
(2021)
Use of whole genome sequencing to determine genetic basis of suspected mitochondrial disorders: cohort study.
in BMJ (Clinical research ed.)
Sadeghi-Alavijeh O
(2023)
Rare variants in the sodium-dependent phosphate transporter gene SLC34A3 explain missing heritability of urinary stone disease.
in Kidney international
Sabatine M
(2017)
Evolocumab and Clinical Outcomes in Patients with Cardiovascular Disease
in New England Journal of Medicine
Rowlands C
(2021)
Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders
in Scientific Reports
Robbe P
(2022)
Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features.
in Nature genetics
Robbe P
(2018)
Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project.
in Genetics in medicine : official journal of the American College of Medical Genetics
Reijns MAM
(2022)
Signatures of TOP1 transcription-associated mutagenesis in cancer and germline.
in Nature
Reijns M
(2022)
Publisher Correction: Signatures of TOP1 transcription-associated mutagenesis in cancer and germline
in Nature
Ragoussis V
(2022)
Using data from the 100,000 Genomes Project to resolve conflicting interpretations of a recurrent TUBB2A mutation.
in Journal of medical genetics
Pu X
(2020)
Effect of a coronary-heart-disease-associated variant of ADAMTS7 on endothelial cell angiogenesis.
in Atherosclerosis
Poulter JA
(2021)
New variants and in silico analyses in GRK1 associated Oguchi disease.
in Human mutation
Pleguezuelos-Manzano C
(2020)
Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli.
in Nature
Persyn E
(2020)
Genome-wide association study of MRI markers of cerebral small vessel disease in 42,310 participants.
in Nature communications
Parry DA
(2021)
Heterozygous lamin B1 and lamin B2 variants cause primary microcephaly and define a novel laminopathy.
in Genetics in medicine : official journal of the American College of Medical Genetics
Pairo-Castineira E
(2021)
Genetic mechanisms of critical illness in COVID-19.
in Nature
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Genomics England Newborn screening funded to £100m |
Geographic Reach | National |
Policy Influence Type | Contribution to new or Improved professional practice |
Impact | No impact yet as service only just agreed to be funded. |
URL | https://www.genomicsengland.co.uk/initiatives/newborns |
Description | UK Clinical Genomics Infrastructure: Co-lead for the preparation for commissioning in the NHS of a National Genomic Health service |
Geographic Reach | National |
Policy Influence Type | Membership of a guideline committee |
Description | UK Clinical Genomics Infrastructure: Member of the Topol Review of Digital, Genomics and Artificial Intelligence implications for workforce planning. |
Geographic Reach | National |
Policy Influence Type | Membership of a guideline committee |
Description | COVID-19 (with CCO) |
Amount | £5,000,000 (GBP) |
Organisation | LifeArc |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2021 |
Description | COVID-19 Matched WGS |
Amount | £9,890,000 (GBP) |
Organisation | Illumina |
Sector | Private |
Country | United States |
Start | 04/2020 |
End | 03/2021 |
Description | COVID-19 WGS |
Amount | £3,000,000 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2021 |
Description | Illumina matched-funds re UK Life Sciences Cancer WGS (Genomics England) |
Amount | £2,250,000 (GBP) |
Organisation | Illumina |
Sector | Private |
Country | United States |
Start | 04/2020 |
End | 03/2022 |
Description | Inward Capital co-investment and 100 science jobs at Illumina |
Amount | £22,000,000 (GBP) |
Organisation | Illumina Inc. |
Sector | Private |
Country | United States |
Start | 04/2020 |
End | 03/2025 |
Description | Long-Read Cancer Sequencing |
Amount | £162,000 (GBP) |
Organisation | Oxford Nanopore Technologies |
Sector | Private |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2022 |
Description | REACT-GE (COVID controls) |
Amount | £1,500,000 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2021 |
Description | UK Clinical Genomics Research Data Infrastructure |
Amount | £2,700,000 (GBP) |
Organisation | Department of Health (DH) |
Sector | Public |
Country | United Kingdom |
Start | 04/2018 |
End | 03/2021 |
Description | UK Life Sciences Cancer WGS (Genomics England) |
Amount | £7,870,000 (GBP) |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2022 |
Title | Genotyping technology |
Description | Taqman genotyping is a main workhorse for SNP genotyping. We adapted a methodology for reaction miniturisation from KBioscience for nanolitre reaction volumes reducing the cost of genotyping by 50% |
Type Of Material | Technology assay or reagent |
Year Produced | 2006 |
Provided To Others? | Yes |
Impact | added value for funders |
Title | High throughput genotyping and sequencing hub |
Description | Barts and The London Genome Centre. Offers high throughput genomics infrastructure to internal and external users including hotel facilities for scientists. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2008 |
Provided To Others? | Yes |
Impact | Multiple major publications in common disease. |
Title | Improved techniques |
Description | The sampling handling approaches and standard operating procedures for phenotyping been used to develop the automated Biobank sample handling system |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2006 |
Provided To Others? | Yes |
Impact | Handling 500000 samples now for UK Biobank |
Title | NHS Genomic Medicine Centres |
Description | I created and established the concept of NHS Genomic Medicine Centres in England which has led to NHS England Commissioning this capacity and capability framework for the 100,000 Genomes Project |
Type Of Material | Improvements to research infrastructure |
Provided To Others? | No |
Impact | Led to NHS England Commissioning this capacity and capability framework for the 100,000 Genomes Project |
Title | Phanotypic and genotypic database |
Description | Initially a microsoft access relational database which we migrated to MySQL database holding all phenotypic and genotypic data for analysis and ease of collaboration. Several other studies have copied or been helped to adapt our approach |
Type Of Material | Biological samples |
Year Produced | 2007 |
Provided To Others? | Yes |
Impact | Others have adopted the database structure for similar phenotypic collections |
Title | The Genomics England Clinical Interpretation Partnership |
Description | We have established, launched and called for expressions of interest to the UK NHS, academics and training to form domains to enhance clinical interpretation of the data from the 100,000 genomes project. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | Receiving expressions of interest for forming GeCIPs from research community |
Title | The UK Clinical Genomics Infrastructure: Clinical Data |
Description | Improvement to the wider UK Clinical Genomics Infrastructure |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | The infrastructure now holds 1.6 billion data points on 94,000 participants and 91,000 whole genomes and recently cancer registry and mortality data (2141 participants with cause of death). |
URL | https://www.genomicsengland.co.uk/ |
Title | UK Clinical Genomics Research Infrastructure |
Description | Data Centre |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | The research data centre for analysis and interpretation of the 100,000 Genomes Project |
Description | Genome Wide Association Study of Lacunar Stroke |
Organisation | University of Cambridge |
Department | Department of Physiology, Development and Neuroscience |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Statistical analysis |
Collaborator Contribution | Data collection, oversight |
Impact | Manuscript in press at Lancet Neurology |
Start Year | 2019 |
Description | Genomics England |
Organisation | Department of Health Social Services and Public Safety (DHSSPS) |
Country | United Kingdom |
Sector | Public |
PI Contribution | I have been Chief Scientist for the 100000 whole genome sequencing programme since 2013. I led and created the consortium that won the grant that creates this data centre for the research component of the 100,000 genomes project. This goes live for the main programme imminently (see further funding) |
Collaborator Contribution | We are completing pilots in rare disease and cancer |
Impact | We have: - returned diagnoses to the NHS - created 13 NHS Genomic Medicine centres across England that serve to enrol, supply clinical data, validate feedback to patients - embarked on the main programme - formed a 12 company consortium to create academic NHS Industry partnerships - 9 HE Institutes now offer a Master's in Genomic Medicine |
Start Year | 2013 |
Description | Immune Mechanisms in Small Vessel Disease |
Organisation | Ludwig Maximilian University of Munich (LMU Munich) |
Country | Germany |
Sector | Academic/University |
PI Contribution | Study design, Primary analysis, study oversight |
Collaborator Contribution | Statistical analysis and interpretation |
Impact | Manuscript under review to BRAIN |
Start Year | 2020 |
Description | Quintiles Prime Site |
Organisation | Quintiles Transnational Corporation |
Country | United States |
Sector | Private |
PI Contribution | I lead the World's first Prime Site which concentrates trials in a single site management organisation at Barts Health and Queen Mary University of London. |
Collaborator Contribution | Our collaboration with Quintiles, now extended across UCLP, created a world-leading trials hub bringing 168 new trials to 3356 UK patients (£ 20m) and leading to creation of 25 similar "Prime Sites" worldwide. |
Impact | It ranges across all therapeutic areas |
Start Year | 2008 |
Description | Chief Scientist for Genomics England - Progress Educational Trust |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Public meeting on the 100,00 Genomes Project, evoked discussions on the programme and data handling Engagement from patient community |
Year(s) Of Engagement Activity | 2014 |
Description | Chief Scientist for Genomics England Town Hall meetings |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Presented and co-led 3 of these meetings. Meetings sparked interesting and lively debates on the 100,000 genome project and what it means for patients Project picked up by social media Further events planned |
Year(s) Of Engagement Activity | 2014 |
Description | Genomics England - 100k Genome Project (Multiple National & International Talks 2015-2018) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Multiple talks about the 100k Genomes project as GEL Chief Scientist |
Year(s) Of Engagement Activity | 2015,2016,2017,2018 |
URL | http://www.genomicsengland.co.uk |
Description | Range of Genomics related talks |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | The future of genomics in the delivery of healthcare |
Year(s) Of Engagement Activity | 2021,2022 |
Description | The Genomics Conversation |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Activities: •Events took place all over the country and a big thank you goes to the teams in Lincoln, York and Nottingham for their awareness raising activities. •We launched our new nursing video - 'Nursing in the Genomic Era' •We hosted our fifth WeNurses chat on 'Nursing and Ethics in the Genomic Era' •Ran a competition using the Genomics Game to conclude the week. •Four #GenomicsConversation podcasts were launched on SoundCloud to introduce nurses and midwives to genomics. •We held our first ever #GenomicsConversation Thunderclap to launch the weeks activities. •We organised a social media pledge campaign with enthusiasts spreading the message far and wide on social media. Engagement: •During the course of the week the website received over 12,000 page views. •Our first ever Thunderclap was a great success delivering a huge social reach with influential supporters from nursing including WeNurses, AgencyNurse and 6CsLive!. •Our four #GenomicsConversation podcasts were streamed over 80 times during week. •We received over 1000 views of our videos. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.genomicseducation.hee.nhs.uk/woa-18/ |