Human functional genomics of post-translationally modifying clinical coding variants: FGx-PTMv

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Rare diseases are debilitating, with one third of children suffering from these diseases dying before their fifth birthday. The 100,000 genomes project delivered by Genomics England and NHS England produced a rich catalogue of genomic variation. By associating genome variants and disease for patients suffering from disease symptoms and syndromes, the project succeeded in determining the genetic basis for many rare diseases. This allowed for clinical diagnoses to be provided for some patients within the rare disease group, but most remain undiagnosed.

In general terms, genome variants are separated into two classes: those that sit within the protein-coding regions of genes (exonic variants), and those that sit outside of the protein (intronic variants). Exonic variants affect the amino acid sequence that comprises the protein. In some cases, these variants have a reliably predictable effect, such as the production of non-functional shortened versions of the protein. In other cases, a variant can result in the normal amino acid being replaced by an alternative. These so-called missense variants can subtly affect protein form and function, and it is more challenging to predict the effect a missense variant will have upon a protein. For example, the post-translational modification (PTM) of proteins can regulate their function, and sometimes these missense variants affect these PTMs, changing the amino acid that the modification is normally attached onto. Hypothetically, PTM variants (PTMv) should have a more predictable effect on protein function. To understand how a given missense variant affects protein function it is necessary to experimentally determine the impact of the variant in a laboratory. One result of this is that for most of the missense variants identified in rare disease-associated genes from the 100,000 genomes project, while we can accurately determine correlation, we cannot be certain of causation. Consequently, these missense variants are not used to inform the clinical diagnoses for patients suffering from these rare diseases.

We will provide much needed functional information for one thousand missense variants present in rare disease-associated genes from the 100,000 genomes project, and in so doing establish a scalable pipeline for the clinical interpretation of many more. To accomplish this, we have established a cross-disciplinary investigative team that interweaves the disciplines of computational genomics, biomedical informatics, mathematics, digital chemistry, bioinformatics, process automation, functional proteomics, biochemistry, and cell biology. Our modular functional genomics variant interpretation platform is built on well-established and new methods, applied at scale. PTMv will be extracted from the rare disease gene panel of the 100,000 genomes project. They will be computationally modelled and prioritised according to their predicted contribution to protein function using an atomistic bond energy propensity analysis. We will build and deploy a new end-to-end variant engineering bioinformatic toolset alongside high-throughput process automation to test the functional contribution of hundreds of these PTMv in live cells at the same time. This functional information will be stably integrated into the European Bioinformatic Research Institute's Protvar resource for national and international academic research impact, and be fed back into Genomics England's research environment to support clinical diagnoses for rare disease patients and improve clinical practice guidelines.

Technical Summary

Rare diseases are debilitating, with one third of children suffering from these diseases dying before their fifth birthday. The 100,000 genomes project delivered by Genomics England and NHS England produced a rich catalogue of genomic variation, enabling diagnoses for many patients suffering from these rare disorders. A large proportion of variants identified fall within protein-coding regions of clinically associated genes, variants that change the amino acid codon and consequently the amino acid at that position within the protein. Unfortunately, in the absence of extensive functional annotation most of these missense variants cannot be used to inform clinical diagnoses.

Our cluster will define the functional contribution of these missense variants, enabling much needed diagnoses for rare-disease patients, increasing the diagnostic yield of the 100,000 genomes project. We will functionally annotate missense variants that impact sites of post-translational modification (PTMv). This will be accomplished through a highly cross-disciplinary approach led by an investigative group that draws together the disciplines of computational genomics, biomedical informatics, mathematics, digital chemistry, bioinformatics, process automation, functional proteomics, biochemistry, and cell biology.

We will mine the rare disease gene panel of the 100,000 genomes project to extract, model, and prioritise PTMv according to their predicted atomistic contribution to protein function. We will then use a new end-to-end variant engineering bioinformatic toolset alongside high-throughput process automation to test the functional contribution of hundreds of these PTMv in live cells. Importantly, our functional genomic cluster establishes a data integration pipeline with the European Bioinformatic Institute and the ProtVar (Protein Variation) resource to ensure our findings empower the wider research community, presenting functional annotations for the interpretation of variants.

Publications

10 25 50