Interpretation of genomic sequencing for inherited ophthalmic disease: an integrated approach
Lead Research Organisation:
University of Manchester
Department Name: School of Biological Sciences
Abstract
Congenital Cataract (CC) is an example of a highly genetically heterogeneous disorder. The most common underlying molecular causes are either misfolding of lens proteins or alterations in solubility. These factors are amenable to study through the analysis of protein structure and function.
Research Plans: To use patient-derived Next Generation DNA Sequence (NGS) data to generate gene-specific computational models that predict likely consequences of non-synonymous missense and in-frame insertion/deletion variation in selected gene families. These models will include information on sequence conservation and protein structure to score likely effects of variants.
Aims: Initially, to develop models for two classes of proteins: crystallins (account for > 45% of CC-causing mutations) and transcription factors. Crystallins maintain lens transparency; pathogenic mutations generally result in structural instability, causing either unfolding /aggregation or preventing assembly. Models will describe the effects on protein structure through analysis of molecular goodness-of-fit. Transcription factors function through interactions with DNA and other proteins; our models will also include specific description of these interaction interfaces. For the 10 and 12 genes from the crystallin and transcription factor families, respectively, that are associated with CC we will develop and train models using variants that are known to be either pathogenic or non-pathogenic, comparing i) published data on disease-associated variants; ii) data on non-disease associated variation (e.g. 1000 genomes, dbSNP, ESP); iii) Non published data on disease associated and non-disease associated variation from our own databases and those of DDD and 100,000 genomes. We will continually expand our databases of well-characterised patients to iteratively improve these models.
The tight coupling between diagnostic and computational modelling will allow gene-specific in silico models to be informed by the results of on-going sequencing. We will extend the methods from these initial gene families to all 115 CC-associated genes. Structures are known for 47 proteins and homologues for a further 42. For the 26 proteins with no available structural information, we will develop methods analogous to environment-specific substitution tables. In silico functional predictions will require supporting in vitro data, therefore, given their number and breadth, crystallin variants will be used initially. Wildtype and mutated proteins will be overexpressed and purified to assess altered behaviour. For example formation, localization and content of aggregates will be assessed by overexpression of wild type and mutant crystallins in primary/established lens epithelial cell lines (e.g. FHL124).
Deliverables will include a set of pre-validated molecular mechanisms for pathogenic variants in selected protein families; prototypic computational methodologies for predicting likely outcome of variants.
Research Plans: To use patient-derived Next Generation DNA Sequence (NGS) data to generate gene-specific computational models that predict likely consequences of non-synonymous missense and in-frame insertion/deletion variation in selected gene families. These models will include information on sequence conservation and protein structure to score likely effects of variants.
Aims: Initially, to develop models for two classes of proteins: crystallins (account for > 45% of CC-causing mutations) and transcription factors. Crystallins maintain lens transparency; pathogenic mutations generally result in structural instability, causing either unfolding /aggregation or preventing assembly. Models will describe the effects on protein structure through analysis of molecular goodness-of-fit. Transcription factors function through interactions with DNA and other proteins; our models will also include specific description of these interaction interfaces. For the 10 and 12 genes from the crystallin and transcription factor families, respectively, that are associated with CC we will develop and train models using variants that are known to be either pathogenic or non-pathogenic, comparing i) published data on disease-associated variants; ii) data on non-disease associated variation (e.g. 1000 genomes, dbSNP, ESP); iii) Non published data on disease associated and non-disease associated variation from our own databases and those of DDD and 100,000 genomes. We will continually expand our databases of well-characterised patients to iteratively improve these models.
The tight coupling between diagnostic and computational modelling will allow gene-specific in silico models to be informed by the results of on-going sequencing. We will extend the methods from these initial gene families to all 115 CC-associated genes. Structures are known for 47 proteins and homologues for a further 42. For the 26 proteins with no available structural information, we will develop methods analogous to environment-specific substitution tables. In silico functional predictions will require supporting in vitro data, therefore, given their number and breadth, crystallin variants will be used initially. Wildtype and mutated proteins will be overexpressed and purified to assess altered behaviour. For example formation, localization and content of aggregates will be assessed by overexpression of wild type and mutant crystallins in primary/established lens epithelial cell lines (e.g. FHL124).
Deliverables will include a set of pre-validated molecular mechanisms for pathogenic variants in selected protein families; prototypic computational methodologies for predicting likely outcome of variants.
People |
ORCID iD |
Graeme Black (Primary Supervisor) | |
Shalaw Sallah (Student) |
Publications
Sallah SR
(2022)
Assessing the Pathogenicity of In-Frame CACNA1F Indel Variants Using Structural Modeling.
in The Journal of molecular diagnostics : JMD
Sallah SR
(2022)
Improving the clinical interpretation of missense variants in X linked genes using structural analysis.
in Journal of medical genetics
Sallah S
(2020)
Using an integrative machine learning approach utilising homology modelling to clinically interpret genetic variants: CACNA1F as an exemplar
in European Journal of Human Genetics
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
MR/N018478/1 | 30/09/2016 | 31/01/2021 | |||
1790437 | Studentship | MR/N018478/1 | 30/09/2016 | 31/01/2021 | Shalaw Sallah |
Title | Protein-Specific Variant Interpreter (ProSper) |
Description | ProSper is underlined by machine learning to predict/classify genetic variants from over two dozens of genes. |
Type Of Material | Computer model/algorithm |
Year Produced | 2020 |
Provided To Others? | No |
Impact | The computational tool can be developed into a standalone diagnostic tool used by researchers and clinicians to interpret and predict pathogenicity of genetic variants. ProSper has also been used to evaluate other computational rooms currently used for the same purpose using different approaches. |
Description | workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | 20-25 postgraduate students/researchers attended a workshop on protein structure analysis. |
Year(s) Of Engagement Activity | 2019 |