Analysis and prediction of protein structure and sequence

Lead Research Organisation: MRC National Inst for Medical Research


We aim to understand how the sequence of a protein determines its 3D structure. The function of proteins depends on their structure and understanding this relationship will help us predict the function of genes from just their DNA sequence and also how they malfunction during disease.

Technical Summary

The central aim of the laboratory is to develop new algorithms and software, principally to facilitate the structural, functional and evolutionary interpretation of sequence data. This can be seen as producing a synergism between the sequence and structural databanks: extending the structural information of the latter to illuminate as many sequences as possible while also using the (aligned) sequence data to help understand the evolutionary pressures that maintain structure. Through this ongoing pursuit, a greater general understanding of protein structure can be gained as well as insight into specific sequence or family which was previously unconnected with any known structure or function.||Methodology: The main approach is empirical --- meaning that the structural context of sequences in known structures is analysed to find recurring patterns or motifs which can then be used predictively. The analysis side of this approach requires computational tools that can compare (align) sequences and structures. Similarly, the synthetic aspect of the problem requires tools that can take sequence/structure relationships and construct a plausible molecular model. In pursuit of the central aim, these tools must be continually improved to encompass increasingly remotely related proteins.||Scientific Advances|Multiple Sequence Alignment|Perhaps the most essential tool in this filed is a method to compare two sequences. While the basic algorithms for comparing sequences have been known for many years, work in the laboratory has contributed to the development of multiple sequence alignment methods allowing very large numbers of sequences to be simultaneously analysed.||Structure Comparison|The comparison of protein structures is essential for a full understanding of their nature and relationships. A novel method was developed to overcome these problems based on the basic sequence comparison algorithm. Along with other developments, this allowed the comparison of all known protein folds.||Sequence/Structure Threading|The development of a method to compare three-dimensional data combined with potentials of mean force resulted in a sequence threading algorithm able to match correctly sequences to the structures of distantly related proteins beyond what could have been achieved using sequence data alone.||Model Construction|A molecular modelling approach based on a novel distance-geometry algorithm has been developed in the laboratory. This method is very robust to inaccurate distance estimates and can incorporate specific constraints from homologous protein folds.||Molecular Evolution|Virtually all methods for the evolutionary analysis of two or more DNA nucleotide sequences rely on one-or-other mathematical model for the evolution of those sequences by repeated substitutions of the nucleotides at each of the sites of the sequences. New statistical analyses were developed to perform better tests, which have motivated the development of improved mathematical models.




10 25 50
Title computer programs 
Description Programs to design and predict protein structure 
Type Of Material Data analysis technique 
Year Produced 2006 
Provided To Others? Yes  
Impact publications web facilities 
Description Enzyme design 
Organisation University of Leicester
Department Department of Cancer Studies
Country United Kingdom 
Sector Academic/University 
PI Contribution protein design
Collaborator Contribution protein synthesis and characterisation
Impact computer programs
Description bionanotechnology/synthetic biology 
Organisation University of Bristol
Department School of Chemistry
Country United Kingdom 
Sector Academic/University 
PI Contribution modelling
Collaborator Contribution conferences/workshops
Impact computer programs
Description protein analysis 
Organisation University of Bergen
Country Norway 
Sector Academic/University 
PI Contribution Publications
Collaborator Contribution Joint PhD students Use of large super-computer (Tromso)
Impact Publications
Description protein design 
Organisation Tokyo Technical University
Country Japan 
Sector Academic/University 
PI Contribution Running our programs on a large computer
Collaborator Contribution Use of large super-computer
Impact publication
Start Year 2006