A novel and rapid approach to predict protein structure

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

IMPORTANCE OF KNOWLEDGE ABOUT PROTEIN STRUCTURE Proteins are molecular machines which carry out most of the basic functions of an organism. They are made of chains of smaller molecules called amino acids. There are twenty types of amino acid, and the precise sequence of amino acids determines the shape and function of the protein. A protein is a large molecule, and in water it folds into a globular structure. The amino acids interact with each other in specific ways. It is important for us to know the shape of a protein as this provides insight into its function and can help in the design of experiments. Knowledge of the structure of a protein can be the starting point for the systematic design of novel regulators of activity such as drugs and agricultural agents. PROTEIN STRUCTURE PREDICTION It is slow, expensive and difficult to find out the structure of a protein directly. However, we now have the DNA sequences for many important organisms, including humans, and we generally can get protein sequences from DNA sequences. We know that the structure of a protein depends entirely on the sequence of its amino acids. Thus we can try to predict the structure of a protein from its sequence. Many successful prediction methods use similarities between the sequence for an unknown structure and the sequence for a known structure - . known as template-based modelling, But what if no such similarity can be found? There are two main methods that are yielding useful predictions today. One, fragment folding, tries to make a structure out of little fragments of other structures. This has been the most successful of the template-free methods in the last few years and has about 50% success rate. It requires high performance computing (up to years of cpu time per prediction). Another method, molecular dynamics, simulates the interactions between the atoms in the protein. Although this approach has provided useful predictions for the very smallest of proteins, it requires a computation time of many years on a single processor. OUR APPROACH We have developed with a new method, called poing, which aims to solve some of the problems with these other methods. We base our approach on a highly simplified model, introduced in the mid 70s, representing the protein as a ball-and-spring model. Each amino acid is represented by just two balls, less than a tenth the number that is used in molecular dynamics. This makes poing very fast. The springs between the balls are modelled using heuristics to represent specific effects which are known to be important in how a protein folds. Our preliminary results show that our approach can yield useful predictions with a run time of 20 hours on a single cpu. THIS PROPOSAL We propose to develop the new model to make it more accurate at predicting structures. We will also take part in a regular protein structure prediction experiment, where different prediction methods are tested on new proteins, and then compared with each other. We will also make our software available to the community via a public web server and by allowing others freely to obtain copies of it to change and run on their own computers. All this work will take three years.

Technical Summary

We have developed a novel and rapid approach (Poing) to template-free protein structure prediction. The approach is a coarse-grained simplified classical dynamics simulation that is orders of magnitude faster than existing approaches (fragment-based folding and all-atom molecular dynamics). This enables the technique to be applied to larger proteins. Poing represents a protein structure in the Levitt & Warshel backbone-plus-sidechain model, and explicitly simulates effects known to be important in protein folding as a network of spring-like forces. Our pilot study results show that state-of-the-art template-free predictions can be obtained in cpu hours. In this grant we will develop and disseminate Poing. We will take part in the international blind trial of prediction (CASP). Key steps are: * Months 1 / 5: CASP8 participation. * Months 1 / 24: Analysis, development and improvement of predicted structures. By minimising the number of incorrect structures produced by our model whilst maintaining the stability of the native state, we will increase the effectiveness of clustering in picking correct models from the time-series samples produced, and therefore improve the predictive accuracy of the model. Current shortcomings to be addressed include: non-native internal voids within structures; incorrect tessellation of sidechains; and non-native topological features. * Months 18 / 24: Public deployment via web server and open source dissemination. * Months 22 / 29: CASP 9 participation. * Months 25 / 36: Development of model selection and integration with complementary, fragment-based approaches.

Publications

10 25 50

publication icon
Cohen J (2013) RAPPORT: running scientific high-performance computing applications on the cloud. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

publication icon
Jefferys BR (2010) Protein folding requires crowd control in a simulated cell. in Journal of molecular biology

publication icon
Williams KJ (2013) Adenylylation of mycobacterial Glnk (PII) protein is induced by nitrogen limitation. in Tuberculosis (Edinburgh, Scotland)

 
Description We developed a method to predict the 3D structure of proteins from sequence without knowledge of a template. We applied to study protein folding in the crowded environment of the cell.
Exploitation Route We incorporated the algorithm into Phyre2 which is a widely used (>80,000 users) server for protein structure prediction.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description We developed a novel approach to fold up parts of proteins.
First Year Of Impact 2011
Sector Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description Biomedical Resource Development Fund
Amount £830,000 (GBP)
Funding ID WT104955MA 
Organisation Wellcome Trust 
Department Wellcome Trust Institutional Strategic Support Fund
Sector Charity/Non Profit
Country United Kingdom
Start 01/2015 
End 12/2020
 
Description EPSRC PhD Studentship
Amount £65,000 (GBP)
Funding ID EP/K502856/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2012 
End 03/2016
 
Title Phyre2 - A portal for protein modelling 
Description Phyre2 is the seond generation of Phyre in which a user pastes a protein sequence and the server returns a predicted 3D structure and provides additional protein modelling. D 
Type Of Technology Webtool/Application 
Year Produced 2011 
Impact During 2013, Phyre2 had over 40,00 unique visitors and since 2012, over 80,000 distinct users. 
URL http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
 
Description Human - computer interaction 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Demonstrated first playable protoype of docking game to general audience

To early to report
Year(s) Of Engagement Activity 2014
 
Description Imperial Festival & Fringe (open to public) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors.
Year(s) Of Engagement Activity 2014,2016
URL https://www.imperial.ac.uk/be-inspired/festival/
 
Description Lecture - Art and Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Talk highlighted link of structural biology and art.

Follow up invitation to talk at a human/computer iteraction conference
Year(s) Of Engagement Activity 2013
 
Description School lecture (London) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Talk to school children to spark interest in science

Requests for work experience
Year(s) Of Engagement Activity 2012
 
Description Talk at school 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Describing use of bioinformatics in medical research
Year(s) Of Engagement Activity 2015
 
Description Work experience for 16-18 year old pupils 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact We provided 1 weeks work experience for about 6 students each year. They visting facilities at Imperial and we introduced to computer programming and molecular graphics.
Year(s) Of Engagement Activity 2014,2015