Identification of allosteric and orthosteric ligand regulation sites using protein structure prediction.

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

The main focus on my work will be developing novel computational methods of protein structure prediction in Dr Sternberg's lab, with the view to apply those tools to support the experimental work of Dr Mann's lab. Dr Mann is researching cystein reactive small molecules that can bind to proteins, thus acting as probes of biological function. The proteins may not have their structures solved experimentally, which brings out the need for modelling.

There currently exist multiple algorithms for protein structure prediction with publicly available implementations. These methods are evaluated in a blind test called Critical Assessment of Structure Prediction (CASP). Traditionally, algorithms based on sequence homology, that aim to identify evolutionary sequence similarity between the template and the model, have performed best. A popular server that uses this approach, Phyre, has been built and maintained in Dr Sternberg's lab. Since the experimental proteins are not guaranteed to be related to well known structures, I am proposing to build a supplement to Phyre that would allow modelling without the homology assumption.

Methods

I would like to start my research of protein structure prediction algorithms by investigating the following two ideas.

1. Contact threading. This is a relatively new method of protein structure prediction that has demonstrated success in the recent rounds of CASP. The main idea of the method is predicting the contact map of a protein - the residues that are in contact with each other - as a proxy to predicting its structure. Eigenvectors of template's and target's contact matrices are aligned with dynamical programming. Multiple sequence alignment (MSA) of a protein to its homologues is used to determine the contact matrices. This is a limitation for applying this method to proteins that do not have enough homologues. I am planning to research a variant that does not require an MSA as its input.

2. Gaussian Mixture Models. These are used to provide a probability based score of how likely a protein fragment of a certain shape is to occur in nature. These scores can be incorporated into dynamical programming alignment procedure that is used in contact threading.

Results

It is unlikely that the new method of protein structure prediction will be as accurate as traditional template modelling for targets that have close well known homologues. The new method will be judged a success if it performs better than the existing methods on proteins that have few or none well known homologues. Performance will be evaluated on a representative set of PDB structures that will be excluded from the training data, as well as on CASP data set.

The new method will be made available for public use as a web service. It will be used to model proteins for the experimental research outlined above. Existing methods for allosteric and orthosteric binding sites will be used to identify ligand binding regions.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M011178/1 01/10/2015 25/02/2025
2133373 Studentship BB/M011178/1 29/09/2018 23/12/2021