A novel and rapid approach to predict protein structure
Lead Research Organisation:
Imperial College London
Department Name: Life Sciences
Abstract
IMPORTANCE OF KNOWLEDGE ABOUT PROTEIN STRUCTURE Proteins are molecular machines which carry out most of the basic functions of an organism. They are made of chains of smaller molecules called amino acids. There are twenty types of amino acid, and the precise sequence of amino acids determines the shape and function of the protein. A protein is a large molecule, and in water it folds into a globular structure. The amino acids interact with each other in specific ways. It is important for us to know the shape of a protein as this provides insight into its function and can help in the design of experiments. Knowledge of the structure of a protein can be the starting point for the systematic design of novel regulators of activity such as drugs and agricultural agents. PROTEIN STRUCTURE PREDICTION It is slow, expensive and difficult to find out the structure of a protein directly. However, we now have the DNA sequences for many important organisms, including humans, and we generally can get protein sequences from DNA sequences. We know that the structure of a protein depends entirely on the sequence of its amino acids. Thus we can try to predict the structure of a protein from its sequence. Many successful prediction methods use similarities between the sequence for an unknown structure and the sequence for a known structure - . known as template-based modelling, But what if no such similarity can be found? There are two main methods that are yielding useful predictions today. One, fragment folding, tries to make a structure out of little fragments of other structures. This has been the most successful of the template-free methods in the last few years and has about 50% success rate. It requires high performance computing (up to years of cpu time per prediction). Another method, molecular dynamics, simulates the interactions between the atoms in the protein. Although this approach has provided useful predictions for the very smallest of proteins, it requires a computation time of many years on a single processor. OUR APPROACH We have developed with a new method, called poing, which aims to solve some of the problems with these other methods. We base our approach on a highly simplified model, introduced in the mid 70s, representing the protein as a ball-and-spring model. Each amino acid is represented by just two balls, less than a tenth the number that is used in molecular dynamics. This makes poing very fast. The springs between the balls are modelled using heuristics to represent specific effects which are known to be important in how a protein folds. Our preliminary results show that our approach can yield useful predictions with a run time of 20 hours on a single cpu. THIS PROPOSAL We propose to develop the new model to make it more accurate at predicting structures. We will also take part in a regular protein structure prediction experiment, where different prediction methods are tested on new proteins, and then compared with each other. We will also make our software available to the community via a public web server and by allowing others freely to obtain copies of it to change and run on their own computers. All this work will take three years.
Technical Summary
We have developed a novel and rapid approach (Poing) to template-free protein structure prediction. The approach is a coarse-grained simplified classical dynamics simulation that is orders of magnitude faster than existing approaches (fragment-based folding and all-atom molecular dynamics). This enables the technique to be applied to larger proteins. Poing represents a protein structure in the Levitt & Warshel backbone-plus-sidechain model, and explicitly simulates effects known to be important in protein folding as a network of spring-like forces. Our pilot study results show that state-of-the-art template-free predictions can be obtained in cpu hours. In this grant we will develop and disseminate Poing. We will take part in the international blind trial of prediction (CASP). Key steps are: * Months 1 / 5: CASP8 participation. * Months 1 / 24: Analysis, development and improvement of predicted structures. By minimising the number of incorrect structures produced by our model whilst maintaining the stability of the native state, we will increase the effectiveness of clustering in picking correct models from the time-series samples produced, and therefore improve the predictive accuracy of the model. Current shortcomings to be addressed include: non-native internal voids within structures; incorrect tessellation of sidechains; and non-native topological features. * Months 18 / 24: Public deployment via web server and open source dissemination. * Months 22 / 29: CASP 9 participation. * Months 25 / 36: Development of model selection and integration with complementary, fragment-based approaches.
People |
ORCID iD |
Michael Sternberg (Principal Investigator) |
Publications
Chubb D
(2010)
Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe.
in Bioinformatics (Oxford, England)
Cohen J
(2013)
RAPPORT: running scientific high-performance computing applications on the cloud.
in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
Jefferys BR
(2010)
Protein folding requires crowd control in a simulated cell.
in Journal of molecular biology
Kelley L
(2015)
The Phyre2 web portal for protein modeling, prediction and analysis
in Nature Protocols
Kelley LA
(2009)
Protein structure prediction on the Web: a case study using the Phyre server.
in Nature protocols
Tomlinson CD
(2013)
XperimentR: painless annotation of a biological experiment for the laboratory scientist.
in BMC bioinformatics
Williams KJ
(2013)
Adenylylation of mycobacterial Glnk (PII) protein is induced by nitrogen limitation.
in Tuberculosis (Edinburgh, Scotland)
Description | We developed a method to predict the 3D structure of proteins from sequence without knowledge of a template. We applied to study protein folding in the crowded environment of the cell. |
Exploitation Route | We incorporated the algorithm into Phyre2 which is a widely used (>80,000 users) server for protein structure prediction. |
Sectors | Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology |
Description | We developed a novel approach to fold up parts of proteins. |
First Year Of Impact | 2011 |
Sector | Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Description | Biomedical Resource Development Fund |
Amount | £830,000 (GBP) |
Funding ID | WT104955MA |
Organisation | Wellcome Trust |
Department | Wellcome Trust Institutional Strategic Support Fund |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 01/2015 |
End | 12/2020 |
Description | EPSRC PhD Studentship |
Amount | £65,000 (GBP) |
Funding ID | EP/K502856/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 10/2012 |
End | 03/2016 |
Title | Phyre2 - A portal for protein modelling |
Description | Phyre2 is the seond generation of Phyre in which a user pastes a protein sequence and the server returns a predicted 3D structure and provides additional protein modelling. D |
Type Of Technology | Webtool/Application |
Year Produced | 2011 |
Impact | During 2013, Phyre2 had over 40,00 unique visitors and since 2012, over 80,000 distinct users. |
URL | http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index |
Description | Human - computer interaction |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Demonstrated first playable protoype of docking game to general audience To early to report |
Year(s) Of Engagement Activity | 2014 |
Description | Imperial Festival & Fringe (open to public) |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors. |
Year(s) Of Engagement Activity | 2014,2016 |
URL | https://www.imperial.ac.uk/be-inspired/festival/ |
Description | Lecture - Art and Science |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Talk highlighted link of structural biology and art. Follow up invitation to talk at a human/computer iteraction conference |
Year(s) Of Engagement Activity | 2013 |
Description | School lecture (London) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Talk to school children to spark interest in science Requests for work experience |
Year(s) Of Engagement Activity | 2012 |
Description | Talk at school |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Describing use of bioinformatics in medical research |
Year(s) Of Engagement Activity | 2015 |
Description | Work experience for 16-18 year old pupils |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | We provided 1 weeks work experience for about 6 students each year. They visting facilities at Imperial and we introduced to computer programming and molecular graphics. |
Year(s) Of Engagement Activity | 2014,2015 |