A Community Resource for the Prediction of Protein Structure: PHYRE

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Proteins are large molecules that are the machinery of life. They are long chains of different components and the order of these components is the amino-acid sequence. The genome projects are now determining the sequences of proteins from many species including human, plants, animals and microbes. Experimental methods can reveal the 3D structure of a protein, and this information is central to basic biological understanding and the exploitation of this biological knowledge has major implications for improvements in health, agriculture, animal welfare and the environment. However, generally this essential information is not available from experiment. Biologists then require computational methods to predict this information. The Sternberg group has developed a powerful and user-friendly resource for predicting the 3D structure of a protein from its sequence. The first version was 3D-PSSM and the more recent version is known as PHYRE. This is disseminated via a web server - a user pastes their protein sequence of interest into a box and the server returns details of the predicted 3D structure with atomic coordinates and additional information. This resource has proved highly popular with the community. There have been over 130,000 submissions and the current rate is 1,000 per week. There have been over 1,200 citations to the two main papers describing 3D-PSSM and PHYRE. This grant will provide support for us to maintain, support and develop the PHYRE web server. The grant will support the following topics. 1) Recent developments which lead to a significant improvement in performance have not yet been incorporated into the software available to the community. We will develop an appropriate web interface to the new version founded on the successful current design principles. 2) The program requires updates of the databases used in the prediction and at present the procedure is managed manually and is computationally time consuming. We will automate and improve the procedure. 3) A number of other computational tools have been developed by groups around the world to predict structural and functional characteristics of proteins that complement PHYRE. These will be integrated into the server to provide a hub of information about a protein of interest. 4) End-user biologists need to know when to trust predictions. Hence we will augment the existing measures of confidence based on protein sequence information with cutting-edge tools that estimate prediction quality based on 3D information. This will permit the biologist to ascertain which regions of a protein model are trustworthy and which are not for use in subsequent theoretical or wet-lab work. 5) Visualisation is key when dealing with complex three-dimensional protein structures. Hence we will substantially extend the user's ability to plot a variety of predicted features mapped onto 3D model predictions. 6) We will provide e-mail user support together with extensive documentation. In addition, we will run three hands-on workshops for biologist interested in using the methodology. The work will be disseminated by publications in the scientific literature and presentations at national and international meetings.

Technical Summary

Our current server for protein structure prediction (PHYRE) is used by 100s of groups worldwide due partly to a user interface that is informative and easy to use for biologists. Recent major improvements made in our lab to the underlying algorithms for structure prediction have been shown to lead to world class modelling accuracy in the recent CASP8 blind trial of structure prediction. We propose to bring these improvements to the biologist via a new server with a range of additional powerful user features: 1) An interface for automatic modelling of multi-domain proteins by iterative fold recognition, multiple template modelling, a powerful new HMM profile matching algorithm and homology network data-mining approach. 2) Extensive help documentation, tutorials, and three 1-day workshops. 3) Robust and up-to-the-minute fold library maintenance including regular total reconstruction protocols to stay current with mounting sequence data. Extensive error-checking routines, e.g. updating PDB entries as higher resolution templates become available. 4) Using CDD and PFAM, long user sequences will be automatically parsed for clear domain boundaries via an interactive web interface to manage multi-domain proteins and selective domain modelling. 5) Integrate 3D model quality estimation using state-of-the-art tools from CASP MQAP category. 6) Improve model visualisation options e.g. rendering models according to: alignment and/or model quality, disorder, sequence motifs, sequence conservation, evolutionary trace etc. 7) Include prediction of other structural/functional features with extant software, e.g. transmembrane helices, coiled coils, repeats and functional residue predictions. 8) Design an expert mode to allow user-selection of templates in multi-template modelling, private submission of user-supplied structures to thread against and batch processing of multiple sequences. 9) Reinstate and update a previously successful functional text data mining approach.
 
Description Genome projects are determining the sequences of proteins from many species including plants, animals and microbes. Experimental methods reveal the 3D structure of a protein, and this information is central to biological understanding and the exploitation of this knowledge has implications for improvements in agriculture, animal welfare, health, and biotechnology. But often this information is not available from experiment. Biologists then require computational methods to predict protein structure. Probably the most effective method to predict protein structure from sequence is to build the model based on the experimentally-determined structure of a known protein.
The Imperial group had developed a web-based server Phyre1 to undertake this task. The user pasted in a sequence and in a few hours the server returned a model. At the start of the grant there were about 4,000 distinct users per month of Phyre1.
Under this grant, the first version Phyre was totally rewritten and a totally new web-interface developed. The resource, Phyre2, was disseminated to the community. Use by the community more than doubled to over 10,000 distinct users per month. Addition features relating to protein function were included.
Under subsequent funding Phyre2 has been developed to incorporate many additional features including enabling users to process many sequences in batch mode. In 2019, there were over 90,000 distinct users. In recognition of the wide use of Phyre, since November 2016 it as been included in ELIXIR a pan-European network of resources for bioinformatics.
Exploitation Route The major beneficiaries who take this forward are bioscience researchers wishing to analyse the proteome of sequenced genomes. Ever more genomes will be sequenced and variations in sequences will need to be studied. Structural annotations will empower a wide range of biological studies by the broad biological community. Phyre has been applied to predicting function, target selection for structure determination, SAXS studies, EM reconstruction, determining domain organization, construct design, molecular replacement, understanding the phenotypic. The Phyre server can also be used for research directly resulting in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
Sectors Agriculture, Food and Drink,Education,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.sbg.bio.ic.ac.uk/servers/phyre2/html/page.cgi?id=index
 
Description Feedback from users demonstrate that the Phyre server has been used for research directly results in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
First Year Of Impact 2009
Sector Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description Biomedical Resource Development Fund
Amount £830,000 (GBP)
Funding ID WT104955MA 
Organisation Wellcome Trust 
Department Wellcome Trust Institutional Strategic Support Fund
Sector Charity/Non Profit
Country United Kingdom
Start 01/2015 
End 12/2020
 
Description EPSRC PhD Studentship
Amount £65,000 (GBP)
Funding ID EP/K502856/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2012 
End 03/2016
 
Title Phyre - A web server for protein structure prediction 
Description Phyre is a web server for protein structure prediction. Users input a sequence and the server returns a predicted 3D protein structure with additional information about protein function. 
Type Of Technology Webtool/Application 
Impact By the end of 2012, Phyre >250 citations in the literature and >1000 submissions a week from institutions around the world. Phyre, under BBSRC funding, was enhanced to yield Phyre2 in 2011. During 2013, Phyre2 had over 40,00 unique visitors and since 2012, over 80,000 distinct users. 
URL http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
 
Title Phyre2 - A portal for protein modelling 
Description Phyre2 is the seond generation of Phyre in which a user pastes a protein sequence and the server returns a predicted 3D structure and provides additional protein modelling. D 
Type Of Technology Webtool/Application 
Year Produced 2011 
Impact During 2013, Phyre2 had over 40,00 unique visitors and since 2012, over 80,000 distinct users. 
URL http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
 
Description Lecture - Art and Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Talk highlighted link of structural biology and art.

Follow up invitation to talk at a human/computer iteraction conference
Year(s) Of Engagement Activity 2013
 
Description The Prince's Teaching Institute 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach National
Primary Audience Schools
Results and Impact Talked about development in bioinformatics and systems biology as a new area in biology

After talk teachers advised students to contact me for work experience
Year(s) Of Engagement Activity 2010