Identifying New Disease Genes & Mechanisms for Musculoskeletal Disorders in 100K Genomes Project using Bioinformatics, Phenotyping & Machine Learning

Lead Research Organisation: University of Oxford
Department Name: Wellcome Trust Centre for Human Genetics

Abstract

Genetic disorders which affect the development of the skeleton or muscles are collectively common, even if individually rare. Providing a genetic diagnosis for the patients and their families is important for ending what is often a lengthy diagnostic odyssey. For their clinicians, it may inform provision of the correct treatment. Understanding the genetic basis of these rare musculoskeletal (MSK) disorders may also provide insights into common MSK disorders, which are a major cause of disability and impairment of quality of life for millions of people in the UK.
In the past, genetic diagnosis of rare MSK diseases has relied on sequencing panels of known genes to identify the causative gene, but the diagnostic yield of such panel-based sequencing is low because many disease genes have not yet been identified.
With technological improvements and cost reductions, sequencing of patients' entire genomes (the full complement of their DNA) has become a possibility. Furthermore, many types of genetic variants can be interrogated from genome sequence data, not just those involving single base pairs, but also more complex duplications, deletions or transpositions of segments of the genome as well as variants in the regions between genes - the introns. These regions have increasingly been recognised to play important roles in regulating gene expression but we have considerably less understanding about their clinical significance.
Interrogation of patients' genomes to identify the disease-causing variants therefore still presents many challenges. Recognising the potential of this genome sequencing approach, the UK launched a national programme (100KGP) to identify pathogenic variants in 100,000 patients, with the aim of improving diagnoses for these patients that might also inform their personalised treatment. Run by Genomics England, sequencing of these patients is now complete and it is estimated that diagnoses have been found for a quarter of the rare disease patients so far. Solving the rest of these cases will require intense effort on behalf of the research community to investigate the different variant types described above.
This proposal aims to contribute to that effort focusing on patients with musculoskeletal and related developmental conditions.
We will use both existing GeL algorithms and our own bioinformatics tools to analyse the genome sequence data to ensure we have investigated all possible variants, and then employ a variety of genetic strategies to assess whether the genes are potentially pathogenic. We invariably need additional clinical or x-ray data to that already collected by the GeL programme. However, this is often available in medical records so we have identified routes to retrieving this which involve clinicians and patients themselves. We have established a clinical multi-disciplinary team to enable discussion of cases, and will employ expertise in clinical radiology assessments to ensure systematic analysis of x-ray data. We will also ask patients to provide us with self-reported data, as we know from other research studies that patients are very good at remembering which bones they have broken and when. Finally, we will see if machine learning or 'artificial intelligence' can help us identify patterns in these vast and complex datasets which could not be identified by our manual inspection.
We anticipate that these efforts will help us provide diagnoses for many more patients in the 100KGP and can then be adopted for other diseases in the 100KGP providing genetic diagnoses for many more patients.

Technical Summary

Whole genome sequencing (WGS) has the potential to revolutionise diagnosis of Rare Diseases. Recognising this, the UK has established a national programme to sequence 100,000 genomes (100KGP). To date, analysis in 100KGP has primarily focused on known disease genes for a given condition and on particular variant types - predominantly single nucleotide variants (SNVs) and a diagnostic yield of ~25% has been achieved. A more research-focused effort is now required to investigate novel disease genes and variant types, such as copy number, other structural variants (CNVs/SVs) and non-coding variants that are largely unexplored in the 100KGP to date. In order to address this requirement, we will focus on patients with musculoskeletal (MSK) disorders in the 100KGP which are a clinically and genetically heterogeneous group of conditions accounting for >1,000 cases in 100KGP. In preliminary studies, we have already identified a non-coding variant that contributes 1% to diagnostic yield of osteogenesis imperfecta and 2 complex SVs in known genes.
We will use novel bioinformatics techniques to comprehensively analyse the WGS data, integrating the various variant types to identify putative novel disease genes. We will combine this with deep phenotyping as core MSK clinical data has not been collected by 100KGP and is required in the assessment of candidate genes. We will also evaluate whether machine learning can be used to identify clusters of genotypes or phenotypes from these complex high dimensionality datasets enabling novel genotype/phenotype correlations to be observed.
Although this proposal focuses on MSK conditions, we anticipate that evaluation of the bioinformatics algorithms, platforms for deep phenotyping at scale and machine learning approaches will be informative for other disease domains in the 100KGP and can be leveraged to increase diagnostic yield across the dataset, as well as helping to maximise the research potential of this unique resource.

Publications

10 25 50
 
Description Guiding principles for the detection and management of foramen magnum stenosis
Geographic Reach Europe 
Policy Influence Type Influenced training of practitioners or researchers
Impact Improved diagnosis and treatment of an achondroplasia subtype.
URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10375694
 
Description Providing genetic diagnoses for patients
Geographic Reach National 
Policy Influence Type Contribution to new or improved professional practice
Impact Diagnosis of genetic condition provides impacts for patients and helps their family members. To date (updated March 2024) diagnoses have been provided for ~150 patients since the start of this grant and 391 overall.
 
Description NIHR Biomedical Research Centres
Amount £4,700,000 (GBP)
Funding ID NIHR203311 
Organisation Oxford University Hospitals NHS Foundation Trust 
Sector Academic/University
Country United Kingdom
Start 12/2022 
End 11/2027
 
Description Research Chair
Amount £1,586,000 (GBP)
Organisation Royal Academy of Engineering 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2023 
End 04/2028
 
Description Research Professorship
Amount £1,823,387 (GBP)
Funding ID NIHR302440 
Organisation National Institute for Health Research 
Sector Public
Country United Kingdom
Start 12/2022 
End 11/2027
 
Title Phenotype questionnaire for clinicians 
Description Developed a questionnaire for clinicians to augment the clinical data associated with patients in the 100,000 Genomes Project. This questionnaire has been developed following broad consultation with clinicians at our MDT and in UK more generally. 
Type Of Material Physiological assessment or outcome measure 
Year Produced 2023 
Provided To Others? Yes  
Impact The questionnaires will enable us to collect more detailed clinical characteristics of patients in 100,000 Genomes Project which will aid identification of causative genes for these rare musculoskeletal conditions. 
 
Title Phenotype questionnaire for patients 
Description Questionnaire developed in consultation with patients and the patient group, the Brittle Bone Society, to facilitate collection of self-reported patient data 
Type Of Material Physiological assessment or outcome measure 
Year Produced 2023 
Provided To Others? Yes  
Impact The questionnaires will enable us to collect more detailed phenotypic data on musculoskeletal characteristics of patients in 100,000 Genomes Project, which will aid the identification of causative genes for these rare conditions. 
 
Title AI for Complex Healthcare Data 
Description The primary output of this research activity is AI-based methods for training models from multimodal healthcare data, and for using the resulting models for phenotyping, prediction, and decision support. The activity described is one of the UK's largest "AI for Healthcare" teams, supported by this award. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact Citations, collaborations, implementations. 
 
Description Collaboration on 100,000 Genomes Project 
Organisation Genomics England
Country United Kingdom 
Sector Public 
PI Contribution The grant focuses on analysis of whole genome sequencing data for patients enrolled in the musculoskeletal domain of the 100,000 Genomes Project.
Collaborator Contribution Genomics England has provided some bioinformatics support, and is generating resources within the Research Environment, to allow us, and other users, to more readily interrogate splicing variants across the whole project.
Impact The approach has helped us to identify some splicing variants in specific genes of interest
Start Year 2022
 
Description Collaboration with Origins of Bone and Cartilage Disease Project 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We are investigating whether genes identified from the International Mouse Phenotyping Consortium mouse mutagenesis project which have been found to cause defects in murine bone quality or quantity also cause skeletal phenotypes in humans, by analysing the whole genome sequencing data in the 100,000 Genomes Project
Collaborator Contribution Our partners, Prof Duncan Bassett and Prof Graham Williams, have generated the detailed skeletal phenotyping data on 1000 mouse single gene deletion lines
Impact We have received a list of the genes which cause defects in mouse bone quality or quantity and are checking these to see if there any patients with variants in the equivalent human genes in the Genomics England dataset.
Start Year 2023
 
Title Genetic diagnosis for ENPP1 deficiency enabling patient entry into clinical trial 
Description By providing a patient with a genetic diagnosis for ENPP1 deficiency, the patient has been shown to be eligible for a clinical trial of a recombinant ENPP1 protein product being developed by Inozyme pharma. The contribution from this MRC grant is to provide the genetic diagnosis from whole genome sequencing data available in the 100,000 Genomes Project. The patient's clinician subsequently referred the patient to the clinical trial once the diagnosis had been confirmed in an accredited genetics lab. Our award does not cover the clinical development of the ENPP1 protein, and we are not involved in the clinical trial, but it is important to record these outcomes too (and there is no other place to record such outcomes on this portal since 'Other Outcomes' category has now been removed). 
Type Management of Diseases and Conditions
Current Stage Of Development Early clinical assessment
Year Development Stage Completed 2023
Development Status Under active development/distribution
Impact We think it important to record this as the aim of our MRC project is to ensure that genetic diagnoses for patients inform their treatment where possible. Inozyme announced positive interim data from its Phase 1/2 trial of 9 patients in Sept 2023. 
URL https://www.biospace.com/article/releases/inozyme-pharma-announces-positive-interim-data-from-ongoin...
 
Title Patient2Genes 
Description Machine learning approach to identify genes associated with specific conditions from large whole genome sequencing datasets 
Type Of Technology Software 
Year Produced 2023 
Impact Prototype developed, still under development 
 
Title Patient2Mutations 
Description Software to identify pathogenic variants associated with specific diseases from whole genome sequencing datasets 
Type Of Technology Software 
Year Produced 2023 
Impact Prototype software written, still in development 
 
Title SVRare: discovering disease-causing structural variants in the 100K Genomes Project 
Description Software/bioinformatics pipeline to interrogate structural variants in whole genome sequencing data 
Type Of Technology Software 
Year Produced 2022 
Impact The software details were published in 2021 but since then we have applied this tool in the 100,000 Genomes Project as part of this MRC grant and identified multiple structural variants responsible for the disease pathogenesis of patients. 
URL https://www.medrxiv.org/content/10.1101/2021.10.15.21265069v1
 
Description Medics4RareDiseases interview 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Recorded 5 minute interview at an event for the charity Medics4RareDiseases.
Discussed importance of research into rare diseases and discussed the work of this particular Genomics England Clinical Interpretation Partnership.
This interview was for the Medics4RareDiseases YouTube channel.
Year(s) Of Engagement Activity 2024
 
Description PPI Activities 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact PPI activities, undertaken at our AI lab in Oxford
Year(s) Of Engagement Activity 2023,2024
 
Description Rare Disease video 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Video highlighting Oxford's programmes in rare disease genomics and the progression to developing advanced therapeutics.
These therapies are the type of research outcomes we anticipate will result from this grant into musculoskeletal disorders.
Year(s) Of Engagement Activity 2022,2023
URL https://www.youtube.com/watch?v=iGHis8MAjdc