UK Infrastructure for Large-scale Clinical Genomics Research
Lead Research Organisation:
Queen Mary University of London
Department Name: UNLISTED
Abstract
This proposal to the MRC will establish a shared, secure, high performance data and compute infrastructure as a platform for large-scale clinical genomics research based on the data flows of the UK 100,000 Genomes project.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.
Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.
The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.
The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK
Biobank and the European Bioinformatics Institute.
A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data.
Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment.
The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research.
The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK
Biobank and the European Bioinformatics Institute.
A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Technical Summary
This proposal to the MRC will establish a shared, secure, high performance data and compute infrastructure as a platform for large-scale clinical genomics research based on the data flows of the UK 100,000 Genomes project.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data. Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment. The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research. The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute. A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Samples and data from patients with cancer and rare, inherited disorders will be provided by NHS England, working in collaboration with Cancer Research UK and programmes funded by the NIHR and the MRC. Genomics England, a company wholly owned by the Department of Health, will pay for the generation of whole genome sequence data. Genomics England will pay also for the generation of summary reports, based upon clinical annotations of this data, and will return these to the NHS to support patient care. Genomics England will make anonymised, redacted versions of the data available for industrial research strictly within a secure, managed environment. The proposed infrastructure will provide a similar environment for academic research, with a more comprehensive collection of genomic and patient data, including the read-level data used for the generation of variant calls and summary reports. The infrastructure will include software tools to support the production of 'research-ready' data sets, the effective management of patient and genomic data, and the delivery of collaborative clinical research. The project partners have experience in infrastructure development and clinical genomics research, and will be able to reuse designs, procedures, and software developed and tested within existing programmes and organisations, including UK Biobank and the European Bioinformatics Institute. A formal mechanism will be established for engagement with public, charitable, and philanthropic funders, and with the clinical research projects that they fund. Subject to capacity constraints, projects that add appropriate value to the Genomics England programme will be provided with access to the compute infrastructure at no charge.
Organisations
- Queen Mary University of London (Lead Research Organisation)
- Department of Health Social Services and Public Safety (DHSSPS) (Collaboration)
- Quintiles Transnational Corporation (Collaboration)
- Ludwig Maximilian University of Munich (LMU Munich) (Collaboration)
- UNIVERSITY OF CAMBRIDGE (Collaboration)
Publications
Kousathanas A
(2022)
Whole-genome sequencing reveals host factors underlying critical COVID-19.
Lesurf R
(2022)
Whole genome sequencing delineates regulatory, copy number, and cryptic splice variants in early onset cardiomyopathy
in npj Genomic Medicine
Lin SJ
(2021)
Biallelic variants in KARS1 are associated with neurodevelopmental disorders and hearing loss recapitulated by the knockout zebrafish.
in Genetics in medicine : official journal of the American College of Medical Genetics
Liu DJ
(2017)
Exome-wide association study of plasma lipids in >300,000 individuals.
in Nature genetics
Lloyd KCK
(2020)
The Deep Genome Project.
in Genome biology
Macken WL
(2022)
Specialist multidisciplinary input maximises rare disease diagnoses from whole genome sequencing.
in Nature communications
Magavern E
(2023)
Factor V Leiden, estrogen, and multimorbidity association with venous thromboembolism in a British-South Asian cohort
in iScience
Magavern E
(2023)
CYP2C19 loss-of-function alleles are not associated with higher prevalence of gastrointestinal bleeds in those who have been prescribed antidepressants: Analysis in a British-South Asian cohort
in British Journal of Clinical Pharmacology
Magavern E
(2023)
UK Prescribing Safety Assessment (PSA): The development, implementation and outcomes of a national online prescribing assessment
in British Journal of Clinical Pharmacology
Magavern E
(2024)
Use of Genomics to Develop Novel Therapeutics and Personalize Hypertension Therapy
in Arteriosclerosis, Thrombosis, and Vascular Biology
Magavern EF
(2023)
Equal access to pharmacogenomics testing: The ethical imperative for population-wide access in the UK NHS.
in British journal of clinical pharmacology
Magavern EF
(2023)
CYP2C19 Genotype Prevalence and Association With Recurrent Myocardial Infarction in British-South Asians Treated With Clopidogrel.
in JACC. Advances
Magavern EF
(2022)
The role of pharmacogenomics in contemporary cardiovascular therapy: a position statement from the European Society of Cardiology Working Group on Cardiovascular Pharmacotherapy.
in European heart journal. Cardiovascular pharmacotherapy
Marouli E
(2019)
Mendelian randomisation analyses find pulmonary factors mediate the effect of height on coronary artery disease.
in Communications biology
Marques P
(2019)
Hypertension due to a deoxycorticosterone-secreting adrenal tumour diagnosed during pregnancy.
in Endocrinology, diabetes & metabolism case reports
Martin AR
(2019)
PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels.
in Nature genetics
McGuigan A
(2022)
Multilocus Inherited Neoplasia Allele Syndrome (MINAS): an update.
in European journal of human genetics : EJHG
McGuigan A
(2022)
Multilocus Inherited Neoplasia Allele Syndrome (MINAS): an update.
Moreno-Ruiz N
(2022)
Assessing the digenic model in rare disorders using population sequencing data.
in European journal of human genetics : EJHG
Murugaesu N
(2022)
Insights for precision healthcare from the 100,000 Genomes Cancer Programme
Niggl E
(2023)
HNRNPC haploinsufficiency affects alternative splicing of intellectual disability-associated genes and causes a neurodevelopmental disorder.
in American journal of human genetics
Noordam R
(2019)
Effects of Calcium, Magnesium, and Potassium Concentrations on Ventricular Repolarization in Unselected Individuals.
in Journal of the American College of Cardiology
Owen N
(2022)
Identification of 4 novel human ocular coloboma genes ANK3, BMPR1B, PDGFRA, and CDH4 through evolutionary conserved vertebrate gene analysis.
in Genetics in medicine : official journal of the American College of Medical Genetics
Pagnamenta AT
(2021)
An ancestral 10-bp repeat expansion in VWA1 causes recessive hereditary motor neuropathy.
in Brain : a journal of neurology
Pairo-Castineira E
(2021)
Genetic mechanisms of critical illness in COVID-19.
in Nature
Pairo-Castineira E
(2021)
Genetic mechanisms of critical illness in COVID-19.
Parry DA
(2021)
Heterozygous lamin B1 and lamin B2 variants cause primary microcephaly and define a novel laminopathy.
in Genetics in medicine : official journal of the American College of Medical Genetics
Persyn E
(2020)
Genome-wide association study of MRI markers of cerebral small vessel disease in 42,310 participants.
in Nature communications
Pleguezuelos-Manzano C
(2020)
Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli.
in Nature
Poulter JA
(2021)
New variants and in silico analyses in GRK1 associated Oguchi disease.
in Human mutation
Pu X
(2020)
Effect of a coronary-heart-disease-associated variant of ADAMTS7 on endothelial cell angiogenesis.
in Atherosclerosis
Ragoussis V
(2022)
Using data from the 100,000 Genomes Project to resolve conflicting interpretations of a recurrent TUBB2A mutation.
in Journal of medical genetics
Reijns M
(2022)
Publisher Correction: Signatures of TOP1 transcription-associated mutagenesis in cancer and germline
in Nature
Reijns MAM
(2022)
Signatures of TOP1 transcription-associated mutagenesis in cancer and germline.
in Nature
Robbe P
(2022)
Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features.
in Nature genetics
Robbe P
(2018)
Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project.
in Genetics in medicine : official journal of the American College of Medical Genetics
Rowlands C
(2021)
Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders
in Scientific Reports
Sabatine M
(2017)
Evolocumab and Clinical Outcomes in Patients with Cardiovascular Disease
in New England Journal of Medicine
Sadeghi-Alavijeh O
(2023)
Rare variants in the sodium-dependent phosphate transporter gene SLC34A3 explain missing heritability of urinary stone disease.
in Kidney international
Schon KR
(2021)
Use of whole genome sequencing to determine genetic basis of suspected mitochondrial disorders: cohort study.
in BMJ (Clinical research ed.)
Scott RH
(2019)
Genomic medicine: time for health-care transformation.
in Lancet (London, England)
Shoemark A
(2022)
Genome sequencing reveals underdiagnosis of primary ciliary dyskinesia in bronchiectasis.
in The European respiratory journal
Siedlinski M
(2023)
Genetic analyses identify brain structures related to cognitive impairment associated with elevated blood pressure.
in European heart journal
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Chief Scientist Genomics England |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | http://www.genomicsengland.co.uk |
Description | Genomics England Newborn screening funded to £100m |
Geographic Reach | National |
Policy Influence Type | Contribution to new or Improved professional practice |
Impact | No impact yet as service only just agreed to be funded. |
URL | https://www.genomicsengland.co.uk/initiatives/newborns |
Description | UK Clinical Genomics Infrastructure: Co-lead for the preparation for commissioning in the NHS of a National Genomic Health service |
Geographic Reach | National |
Policy Influence Type | Membership of a guideline committee |
Description | UK Clinical Genomics Infrastructure: Member of the Topol Review of Digital, Genomics and Artificial Intelligence implications for workforce planning. |
Geographic Reach | National |
Policy Influence Type | Membership of a guideline committee |
Description | COVID-19 (with CCO) |
Amount | £5,000,000 (GBP) |
Organisation | LifeArc |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2021 |
Description | COVID-19 Matched WGS |
Amount | £9,890,000 (GBP) |
Organisation | Illumina |
Sector | Private |
Country | United States |
Start | 04/2020 |
End | 03/2021 |
Description | COVID-19 WGS |
Amount | £3,000,000 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2021 |
Description | Illumina matched-funds re UK Life Sciences Cancer WGS (Genomics England) |
Amount | £2,250,000 (GBP) |
Organisation | Illumina |
Sector | Private |
Country | United States |
Start | 04/2020 |
End | 03/2022 |
Description | Inward Capital co-investment and 100 science jobs at Illumina |
Amount | £22,000,000 (GBP) |
Organisation | Illumina Inc. |
Sector | Private |
Country | United States |
Start | 04/2020 |
End | 03/2025 |
Description | Long-Read Cancer Sequencing |
Amount | £162,000 (GBP) |
Organisation | Oxford Nanopore Technologies |
Sector | Private |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2022 |
Description | REACT-GE (COVID controls) |
Amount | £1,500,000 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2021 |
Description | UK Clinical Genomics Research Data Infrastructure |
Amount | £2,700,000 (GBP) |
Organisation | Department of Health (DH) |
Sector | Public |
Country | United Kingdom |
Start | 04/2018 |
End | 03/2021 |
Description | UK Life Sciences Cancer WGS (Genomics England) |
Amount | £7,870,000 (GBP) |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2022 |
Title | Genotyping technology |
Description | Taqman genotyping is a main workhorse for SNP genotyping. We adapted a methodology for reaction miniturisation from KBioscience for nanolitre reaction volumes reducing the cost of genotyping by 50% |
Type Of Material | Technology assay or reagent |
Year Produced | 2006 |
Provided To Others? | Yes |
Impact | added value for funders |
Title | High throughput genotyping and sequencing hub |
Description | Barts and The London Genome Centre. Offers high throughput genomics infrastructure to internal and external users including hotel facilities for scientists. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2008 |
Provided To Others? | Yes |
Impact | Multiple major publications in common disease. |
Title | Improved techniques |
Description | The sampling handling approaches and standard operating procedures for phenotyping been used to develop the automated Biobank sample handling system |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2006 |
Provided To Others? | Yes |
Impact | Handling 500000 samples now for UK Biobank |
Title | NHS Genomic Medicine Centres |
Description | I created and established the concept of NHS Genomic Medicine Centres in England which has led to NHS England Commissioning this capacity and capability framework for the 100,000 Genomes Project |
Type Of Material | Improvements to research infrastructure |
Provided To Others? | No |
Impact | Led to NHS England Commissioning this capacity and capability framework for the 100,000 Genomes Project |
Title | Phanotypic and genotypic database |
Description | Initially a microsoft access relational database which we migrated to MySQL database holding all phenotypic and genotypic data for analysis and ease of collaboration. Several other studies have copied or been helped to adapt our approach |
Type Of Material | Biological samples |
Year Produced | 2007 |
Provided To Others? | Yes |
Impact | Others have adopted the database structure for similar phenotypic collections |
Title | The Genomics England Clinical Interpretation Partnership |
Description | We have established, launched and called for expressions of interest to the UK NHS, academics and training to form domains to enhance clinical interpretation of the data from the 100,000 genomes project. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | Receiving expressions of interest for forming GeCIPs from research community |
Title | The UK Clinical Genomics Infrastructure: Clinical Data |
Description | Improvement to the wider UK Clinical Genomics Infrastructure |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | The infrastructure now holds 1.6 billion data points on 94,000 participants and 91,000 whole genomes and recently cancer registry and mortality data (2141 participants with cause of death). |
URL | https://www.genomicsengland.co.uk/ |
Title | UK Clinical Genomics Research Infrastructure |
Description | Data Centre |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | The research data centre for analysis and interpretation of the 100,000 Genomes Project |
Description | Genome Wide Association Study of Lacunar Stroke |
Organisation | University of Cambridge |
Department | Department of Physiology, Development and Neuroscience |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Statistical analysis |
Collaborator Contribution | Data collection, oversight |
Impact | Manuscript in press at Lancet Neurology |
Start Year | 2019 |
Description | Genomics England |
Organisation | Department of Health Social Services and Public Safety (DHSSPS) |
Country | United Kingdom |
Sector | Public |
PI Contribution | I have been Chief Scientist for the 100000 whole genome sequencing programme since 2013. I led and created the consortium that won the grant that creates this data centre for the research component of the 100,000 genomes project. This goes live for the main programme imminently (see further funding) |
Collaborator Contribution | We are completing pilots in rare disease and cancer |
Impact | We have: - returned diagnoses to the NHS - created 13 NHS Genomic Medicine centres across England that serve to enrol, supply clinical data, validate feedback to patients - embarked on the main programme - formed a 12 company consortium to create academic NHS Industry partnerships - 9 HE Institutes now offer a Master's in Genomic Medicine |
Start Year | 2013 |
Description | Immune Mechanisms in Small Vessel Disease |
Organisation | Ludwig Maximilian University of Munich (LMU Munich) |
Country | Germany |
Sector | Academic/University |
PI Contribution | Study design, Primary analysis, study oversight |
Collaborator Contribution | Statistical analysis and interpretation |
Impact | Manuscript under review to BRAIN |
Start Year | 2020 |
Description | Quintiles Prime Site |
Organisation | Quintiles Transnational Corporation |
Country | United States |
Sector | Private |
PI Contribution | I lead the World's first Prime Site which concentrates trials in a single site management organisation at Barts Health and Queen Mary University of London. |
Collaborator Contribution | Our collaboration with Quintiles, now extended across UCLP, created a world-leading trials hub bringing 168 new trials to 3356 UK patients (£ 20m) and leading to creation of 25 similar "Prime Sites" worldwide. |
Impact | It ranges across all therapeutic areas |
Start Year | 2008 |
Description | Chief Scientist for Genomics England - Progress Educational Trust |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Public meeting on the 100,00 Genomes Project, evoked discussions on the programme and data handling Engagement from patient community |
Year(s) Of Engagement Activity | 2014 |
Description | Chief Scientist for Genomics England Town Hall meetings |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Presented and co-led 3 of these meetings. Meetings sparked interesting and lively debates on the 100,000 genome project and what it means for patients Project picked up by social media Further events planned |
Year(s) Of Engagement Activity | 2014 |
Description | Genomics England - 100k Genome Project (Multiple National & International Talks 2015-2018) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Multiple talks about the 100k Genomes project as GEL Chief Scientist |
Year(s) Of Engagement Activity | 2015,2016,2017,2018 |
URL | http://www.genomicsengland.co.uk |
Description | Range of Genomics related talks |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | The future of genomics in the delivery of healthcare |
Year(s) Of Engagement Activity | 2021,2022 |
Description | The Genomics Conversation |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Activities: •Events took place all over the country and a big thank you goes to the teams in Lincoln, York and Nottingham for their awareness raising activities. •We launched our new nursing video - 'Nursing in the Genomic Era' •We hosted our fifth WeNurses chat on 'Nursing and Ethics in the Genomic Era' •Ran a competition using the Genomics Game to conclude the week. •Four #GenomicsConversation podcasts were launched on SoundCloud to introduce nurses and midwives to genomics. •We held our first ever #GenomicsConversation Thunderclap to launch the weeks activities. •We organised a social media pledge campaign with enthusiasts spreading the message far and wide on social media. Engagement: •During the course of the week the website received over 12,000 page views. •Our first ever Thunderclap was a great success delivering a huge social reach with influential supporters from nursing including WeNurses, AgencyNurse and 6CsLive!. •Our four #GenomicsConversation podcasts were streamed over 80 times during week. •We received over 1000 views of our videos. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.genomicseducation.hee.nhs.uk/woa-18/ |