Interpretation of genomic sequencing for inherited ophthalmic disease: an integrated approach

Lead Research Organisation: University of Manchester
Department Name: School of Biological Sciences


Congenital Cataract (CC) is an example of a highly genetically heterogeneous disorder. The most common underlying molecular causes are either misfolding of lens proteins or alterations in solubility. These factors are amenable to study through the analysis of protein structure and function.

Research Plans: To use patient-derived Next Generation DNA Sequence (NGS) data to generate gene-specific computational models that predict likely consequences of non-synonymous missense and in-frame insertion/deletion variation in selected gene families. These models will include information on sequence conservation and protein structure to score likely effects of variants.
Aims: Initially, to develop models for two classes of proteins: crystallins (account for > 45% of CC-causing mutations) and transcription factors. Crystallins maintain lens transparency; pathogenic mutations generally result in structural instability, causing either unfolding /aggregation or preventing assembly. Models will describe the effects on protein structure through analysis of molecular goodness-of-fit. Transcription factors function through interactions with DNA and other proteins; our models will also include specific description of these interaction interfaces. For the 10 and 12 genes from the crystallin and transcription factor families, respectively, that are associated with CC we will develop and train models using variants that are known to be either pathogenic or non-pathogenic, comparing i) published data on disease-associated variants; ii) data on non-disease associated variation (e.g. 1000 genomes, dbSNP, ESP); iii) Non published data on disease associated and non-disease associated variation from our own databases and those of DDD and 100,000 genomes. We will continually expand our databases of well-characterised patients to iteratively improve these models.
The tight coupling between diagnostic and computational modelling will allow gene-specific in silico models to be informed by the results of on-going sequencing. We will extend the methods from these initial gene families to all 115 CC-associated genes. Structures are known for 47 proteins and homologues for a further 42. For the 26 proteins with no available structural information, we will develop methods analogous to environment-specific substitution tables. In silico functional predictions will require supporting in vitro data, therefore, given their number and breadth, crystallin variants will be used initially. Wildtype and mutated proteins will be overexpressed and purified to assess altered behaviour. For example formation, localization and content of aggregates will be assessed by overexpression of wild type and mutant crystallins in primary/established lens epithelial cell lines (e.g. FHL124).
Deliverables will include a set of pre-validated molecular mechanisms for pathogenic variants in selected protein families; prototypic computational methodologies for predicting likely outcome of variants.


10 25 50
Title Protein-Specific Variant Interpreter (ProSper) 
Description ProSper is underlined by machine learning to predict/classify genetic variants from over two dozens of genes. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? No  
Impact The computational tool can be developed into a standalone diagnostic tool used by researchers and clinicians to interpret and predict pathogenicity of genetic variants. ProSper has also been used to evaluate other computational rooms currently used for the same purpose using different approaches. 
Description workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact 20-25 postgraduate students/researchers attended a workshop on protein structure analysis.
Year(s) Of Engagement Activity 2019