A novel and rapid approach to predict protein structure

Lead Research Organisation: Imperial College London

Department Name: Life Sciences

Abstract

IMPORTANCE OF KNOWLEDGE ABOUT PROTEIN STRUCTURE Proteins are molecular machines which carry out most of the basic functions of an organism. They are made of chains of smaller molecules called amino acids. There are twenty types of amino acid, and the precise sequence of amino acids determines the shape and function of the protein. A protein is a large molecule, and in water it folds into a globular structure. The amino acids interact with each other in specific ways. It is important for us to know the shape of a protein as this provides insight into its function and can help in the design of experiments. Knowledge of the structure of a protein can be the starting point for the systematic design of novel regulators of activity such as drugs and agricultural agents. PROTEIN STRUCTURE PREDICTION It is slow, expensive and difficult to find out the structure of a protein directly. However, we now have the DNA sequences for many important organisms, including humans, and we generally can get protein sequences from DNA sequences. We know that the structure of a protein depends entirely on the sequence of its amino acids. Thus we can try to predict the structure of a protein from its sequence. Many successful prediction methods use similarities between the sequence for an unknown structure and the sequence for a known structure - . known as template-based modelling, But what if no such similarity can be found? There are two main methods that are yielding useful predictions today. One, fragment folding, tries to make a structure out of little fragments of other structures. This has been the most successful of the template-free methods in the last few years and has about 50% success rate. It requires high performance computing (up to years of cpu time per prediction). Another method, molecular dynamics, simulates the interactions between the atoms in the protein. Although this approach has provided useful predictions for the very smallest of proteins, it requires a computation time of many years on a single processor. OUR APPROACH We have developed with a new method, called poing, which aims to solve some of the problems with these other methods. We base our approach on a highly simplified model, introduced in the mid 70s, representing the protein as a ball-and-spring model. Each amino acid is represented by just two balls, less than a tenth the number that is used in molecular dynamics. This makes poing very fast. The springs between the balls are modelled using heuristics to represent specific effects which are known to be important in how a protein folds. Our preliminary results show that our approach can yield useful predictions with a run time of 20 hours on a single cpu. THIS PROPOSAL We propose to develop the new model to make it more accurate at predicting structures. We will also take part in a regular protein structure prediction experiment, where different prediction methods are tested on new proteins, and then compared with each other. We will also make our software available to the community via a public web server and by allowing others freely to obtain copies of it to change and run on their own computers. All this work will take three years.

Technical Summary

We have developed a novel and rapid approach (Poing) to template-free protein structure prediction. The approach is a coarse-grained simplified classical dynamics simulation that is orders of magnitude faster than existing approaches (fragment-based folding and all-atom molecular dynamics). This enables the technique to be applied to larger proteins. Poing represents a protein structure in the Levitt & Warshel backbone-plus-sidechain model, and explicitly simulates effects known to be important in protein folding as a network of spring-like forces. Our pilot study results show that state-of-the-art template-free predictions can be obtained in cpu hours. In this grant we will develop and disseminate Poing. We will take part in the international blind trial of prediction (CASP). Key steps are: * Months 1 / 5: CASP8 participation. * Months 1 / 24: Analysis, development and improvement of predicted structures. By minimising the number of incorrect structures produced by our model whilst maintaining the stability of the native state, we will increase the effectiveness of clustering in picking correct models from the time-series samples produced, and therefore improve the predictive accuracy of the model. Current shortcomings to be addressed include: non-native internal voids within structures; incorrect tessellation of sidechains; and non-native topological features. * Months 18 / 24: Public deployment via web server and open source dissemination. * Months 22 / 29: CASP 9 participation. * Months 25 / 36: Development of model selection and integration with complementary, fragment-based approaches.

Funded Value:

£322,263

Funded Period:

Aug 08 - Nov 11

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/G003912/1

Principal Investigator:

Michael Sternberg

Research Subject:

Biomolecules & biochemistry (12%)

Tools, technologies & methods (76%)

Research Topic:

Bioinformatics (25%)

Structural biology (12%)

Theoretical biology (51%)

Organisations

People	ORCID iD
Michael Sternberg (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Chubb D (2010) Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. in Bioinformatics (Oxford, England)

Cohen J (2013) RAPPORT: running scientific high-performance computing applications on the cloud. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

Jefferys BR (2010) Protein folding requires crowd control in a simulated cell. in Journal of molecular biology

Kelley LA (2009) Protein structure prediction on the Web: a case study using the Phyre server. in Nature protocols

Kelley LA (2015) The Phyre2 web portal for protein modeling, prediction and analysis. in Nature protocols

Tomlinson CD (2013) XperimentR: painless annotation of a biological experiment for the laboratory scientist. in BMC bioinformatics

Williams K (2013) Adenylylation of mycobacterial Glnk (PII) protein is induced by nitrogen limitation in Tuberculosis

Key Findings
Impact Summary
Further Funding
Software and Technical Products
Engagement Activities


Description	We developed a method to predict the 3D structure of proteins from sequence without knowledge of a template. We applied to study protein folding in the crowded environment of the cell.
Exploitation Route	We incorporated the algorithm into Phyre2 which is a widely used (>80,000 users) server for protein structure prediction.
Sectors	Agriculture Food and Drink Healthcare Pharmaceuticals and Medical Biotechnology


Description	We developed a novel approach to fold up parts of proteins.
First Year Of Impact	2011
Sector	Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology
Impact Types	Economic


Description	Biomedical Resource Development Fund
Amount	£830,000 (GBP)
Funding ID	WT104955MA
Organisation	Wellcome Trust
Department	Wellcome Trust Institutional Strategic Support Fund
Sector	Charity/Non Profit
Country	United Kingdom
Start	01/2015
End	12/2020


Description	EPSRC PhD Studentship
Amount	£65,000 (GBP)
Funding ID	EP/K502856/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	09/2012
End	03/2016


Title	Phyre2 - A portal for protein modelling
Description	Phyre2 is the seond generation of Phyre in which a user pastes a protein sequence and the server returns a predicted 3D structure and provides additional protein modelling. D
Type Of Technology	Webtool/Application
Year Produced	2011
Impact	During 2013, Phyre2 had over 40,00 unique visitors and since 2012, over 80,000 distinct users.
URL	http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index


Description	Human - computer interaction
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Demonstrated first playable protoype of docking game to general audience To early to report
Year(s) Of Engagement Activity	2014


Description	Imperial Festival & Fringe (open to public)
Form Of Engagement Activity	Participation in an open day or visit at my research institution
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors.
Year(s) Of Engagement Activity	2014,2016
URL	https://www.imperial.ac.uk/be-inspired/festival/


Description	Lecture - Art and Science
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Talk highlighted link of structural biology and art. Follow up invitation to talk at a human/computer iteraction conference
Year(s) Of Engagement Activity	2013


Description	School lecture (London)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Talk to school children to spark interest in science Requests for work experience
Year(s) Of Engagement Activity	2012


Description	Talk at school
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Describing use of bioinformatics in medical research
Year(s) Of Engagement Activity	2015


Description	Work experience for 16-18 year old pupils
Form Of Engagement Activity	Participation in an open day or visit at my research institution
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	We provided 1 weeks work experience for about 6 students each year. They visting facilities at Imperial and we introduced to computer programming and molecular graphics.
Year(s) Of Engagement Activity	2014,2015

Abstract

Technical Summary

Organisations

People

ORCID iD

Publications