Interpretation of genomic sequencing for inherited ophthalmic disease: an integrated approach

Lead Research Organisation: University of Manchester

Department Name: School of Biological Sciences

Abstract

Congenital Cataract (CC) is an example of a highly genetically heterogeneous disorder. The most common underlying molecular causes are either misfolding of lens proteins or alterations in solubility. These factors are amenable to study through the analysis of protein structure and function.

Research Plans: To use patient-derived Next Generation DNA Sequence (NGS) data to generate gene-specific computational models that predict likely consequences of non-synonymous missense and in-frame insertion/deletion variation in selected gene families. These models will include information on sequence conservation and protein structure to score likely effects of variants.
Aims: Initially, to develop models for two classes of proteins: crystallins (account for > 45% of CC-causing mutations) and transcription factors. Crystallins maintain lens transparency; pathogenic mutations generally result in structural instability, causing either unfolding /aggregation or preventing assembly. Models will describe the effects on protein structure through analysis of molecular goodness-of-fit. Transcription factors function through interactions with DNA and other proteins; our models will also include specific description of these interaction interfaces. For the 10 and 12 genes from the crystallin and transcription factor families, respectively, that are associated with CC we will develop and train models using variants that are known to be either pathogenic or non-pathogenic, comparing i) published data on disease-associated variants; ii) data on non-disease associated variation (e.g. 1000 genomes, dbSNP, ESP); iii) Non published data on disease associated and non-disease associated variation from our own databases and those of DDD and 100,000 genomes. We will continually expand our databases of well-characterised patients to iteratively improve these models.
The tight coupling between diagnostic and computational modelling will allow gene-specific in silico models to be informed by the results of on-going sequencing. We will extend the methods from these initial gene families to all 115 CC-associated genes. Structures are known for 47 proteins and homologues for a further 42. For the 26 proteins with no available structural information, we will develop methods analogous to environment-specific substitution tables. In silico functional predictions will require supporting in vitro data, therefore, given their number and breadth, crystallin variants will be used initially. Wildtype and mutated proteins will be overexpressed and purified to assess altered behaviour. For example formation, localization and content of aggregates will be assessed by overexpression of wild type and mutant crystallins in primary/established lens epithelial cell lines (e.g. FHL124).
Deliverables will include a set of pre-validated molecular mechanisms for pathogenic variants in selected protein families; prototypic computational methodologies for predicting likely outcome of variants.

Student:

Shalaw Sallah

Period of Study:

Sep 16 - Jan 21

Funder:

MRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1790437

Health Category:

Unclassified

Organisations

People	ORCID iD
Graeme Black (Primary Supervisor)
Shalaw Sallah (Student)

Publications

Author Name Title

Publication Date Published

10 25 50

Sallah SR (2022) Assessing the Pathogenicity of In-Frame CACNA1F Indel Variants Using Structural Modeling. in The Journal of molecular diagnostics : JMD

Sallah SR (2022) Improving the clinical interpretation of missense variants in X linked genes using structural analysis. in Journal of medical genetics

Sallah S (2020) Using an integrative machine learning approach utilising homology modelling to clinically interpret genetic variants: CACNA1F as an exemplar in European Journal of Human Genetics

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
MR/N018478/1			30/09/2016	31/01/2021
1790437	Studentship	MR/N018478/1	30/09/2016	31/01/2021	Shalaw Sallah

Research Databases and Models
Engagement Activities


Title	Protein-Specific Variant Interpreter (ProSper)
Description	ProSper is underlined by machine learning to predict/classify genetic variants from over two dozens of genes.
Type Of Material	Computer model/algorithm
Year Produced	2020
Provided To Others?	No
Impact	The computational tool can be developed into a standalone diagnostic tool used by researchers and clinicians to interpret and predict pathogenicity of genetic variants. ProSper has also been used to evaluate other computational rooms currently used for the same purpose using different approaches.


Description	workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	20-25 postgraduate students/researchers attended a workshop on protein structure analysis.
Year(s) Of Engagement Activity	2019

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects