Maintaining and extending PHYRE2 to deliver an internationally-recognised resource for protein model

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Proteins are large molecules that are the machinery of life. They are long chains of different components and the order of these components is the amino-acid sequence. The genome projects are now determining the sequences of proteins from many species including human, plants, animals and microbes. Experimental methods can reveal the 3D structure of a protein, and this information is central to basic biological understanding and the exploitation of this biological knowledge has major implications for improvements in agriculture, animal welfare, health, and biotechnology. However, generally this essential information is not available from experiment. Biologists then require computational methods to predict this information.

The Sternberg group has developed a powerful and user-friendly resource for predicting the 3D structure of a protein from its sequence. The first version was 3D-PSSM and the more recent version is known as Phyre. This is disseminated via a web server - a user pastes their protein sequence of interest into a box and the server returns details of the predicted 3D structure with atomic coordinates and additional information. This resource has proved highly popular with the community. There have been over 750,000 submissions and the current rate is 2,500 per week. There have been over 2,000 citations to the three main papers describing 3D-PSSM and Phyre.

However, genes and their protein products do not act in isolation. The rapidly growing field of Systems Biology aims to understand Biology at the level of complex systems of interactions, of which proteins are a central component. Many techniques have recently become available to predict the vital parts of a protein that confer its function and the regions of a protein that take part in interactions with other molecules in the cell. Modelling these regions permits a better understanding of the role of genetics in disease by elucidating their role in the basic biochemistry of the cell and the network of interactions in which they take part.

This grant will provide support for us to maintain, support the Phyre web server. We will provide e-mail user support together with extensive documentation. In addition, we will run three hands-on workshops and four road-shows across the UK for biologists interested in using the methodology. The work will be disseminated by publications in the scientific literature and presentations at national and international meetings.

The functionality of Phyre will be enhanced to support the following topics.

1) The prediction of the interacting partners of a protein in the cell to better elucidate function. Determining the interactions a protein makes with other proteins is critical for a researcher to elucidate the protein/gene's wider role in cellular processes and disease. It can aid researchers in building larger models of entire systems.

2) The modelling of the structure of multiple proteins in a complex. In addition to determining which proteins are interacting ((1) above), the specific nature of that interaction gives researchers a detailed insight into which parts of a protein are critical for the interaction. This can then guide hypotheses and experimental design.

3) To suggest the effects of mutations on the structure and function of the protein. Algorithms are available to predict whether a mutation in a protein is likely to alter its function in the cell and these advances will be incorporated into the server.

5) To provide enhanced visualisation, which is key when dealing with complex three-dimensional protein structure. We will substantially extend the user's ability to plot a variety of predicted features mapped onto 3D model predictions, in particular functionally important parts of the protein and regions where mutations are known to occur.

Technical Summary

Phyre, our protein structure prediction server, is used by hundreds of groups worldwide. The aim of this proposal is to maintain, support and extend Phyre to include the prediction of protein interactions, functional sites and multi-protein complexes. We propose to:

1) Hold 3 workshops and 4 road-shows around the UK, maintain the server, provide full user-support and integrate new developments in the field as they arise.

2) Take part in the internation blind trial of protein structure prediction (CASP).

3) Integrate freely available tools and databases for the prediction of protein interfaces (e.g. SCOPPI, ProtInDB). Couple interface prediction of two user proteins to the freely available iWrap interface threading algorithm. Add the ability to return a predicted Biological Unit (BU) for homo-oligomers using both the existing PDB biomolecule information and the complementary ProtBud database of BUs. Potential clashes will be resolved using the method described in (2).

4) Extend our Poing multiple template modelling tool to handle multiple protein chains. Using existing homologous complexes to derive distance constraints, we will be able to relieve clashes and utilise multiple homologous complex constraints simultaneously.

5) Predict interacting partners by combining the remote homology detection of Phyre with the STRING database of interacting proteins via their existing API. A confident template structure detected by Phyre will be used to detect close homologues of the template in STRING to provide candidate interaction partners for the user protein including template structures in complexes.

Planned Impact

This proposal is to maintain and enhance a web-based bioinformatics resource for the bioscience and biomedical communities to perform protein structure prediction using our Phyre system. Based on current demand and anticipated growth we envisage supporting over 20,000 users over the course of the three years. We will now identify those groups that will benefit from this research and in what way they will benefit.

Academic - Many of the users of the Phyre resource will be academic groups and the results of their use of Phyre will advance their research leading to economic and social benefit. Our current user-base is international. The current user-base spans a diverse section of researchers in bioscience and biomedicine requiring information about protein structure and function. Feedback from users has shown that a Phyre prediction can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The bio-energy sector can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.

Public sector - Agencies involved in public health and food security are expected to continue to use the Phyre server. For example the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.

Schools - In talks to schools by the PI, the Phyre server is described as a web-based resource. The development of tools used by many other researchers illustrates the broad impact of science research to the students.

General public - Via open days at Imperial, members of the general public will see demonstrations of Phyre. This will highlight an area of research - bioinformatics- which they may not have been aware of. Furthermore this will highlight the collaborative nature of scientific research with its implications of value for money.

Publications

10 25 50
 
Description Genome projects are determining the sequences of proteins from many species including plants, animals and microbes. Experimental methods reveal the 3D structure of a protein, and this information is central to biological understanding and the exploitation of this knowledge has implications for improvements in agriculture, animal welfare, health, and biotechnology. But often this information is not available from experiment. Biologists then require computational methods to predict protein structure. Probably the most effective method to predict protein structure from sequence is to build the model based on the experimentally-determined structure of a known protein.
The Imperial group had developed a web-based server Phyre2 to undertake this task. The user pasted in a sequence and in a few hours the server returned a model. At the start of the grant (2012) there were about 25,000 distinct users per month of Phyre.
Under this grant, extensive new functionality was added to Phyre2. A resource called PhyreAlarm enabled a user to request weekly reruns of their query against the updated database. A tool PhyreInvestigator enabled a user to explore the accuracy of the predicted structure in detail. The resource, Phyre2, was disseminated to the community. By the end of the grant use by the community markedly increased to about 45,000 distinct users per year
Under subsequent funding Phyre2 has been developed to incorporate many additional features including enabling users to process many sequences in batch mode. In 2019, there were over 90,000 distinct users. In recognition of the wide use of Phyre, since November 2016 it has been included in ELIXIR a pan-European network of resources for bioinformatics.
Exploitation Route The major beneficiaries who take this forward are bioscience researchers wishing to analyse the proteome of sequenced genomes. Ever more genomes will be sequenced and variations in sequences will need to be studied. Structural annotations will empower a wide range of biological studies by the broad biological community. Phyre has been applied to predicting function, target selection for structure determination, SAXS studies, EM reconstruction, determining domain organization, construct design, molecular replacement, understanding the phenotypic. The Phyre server can also be used for research directly resulting in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
Sectors Agriculture, Food and Drink,Education,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index
 
Description Feedback from users demonstrate that the Phyre server has been used for research directly results in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
First Year Of Impact 2012
Sector Agriculture, Food and Drink,Education,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description 18-BBSRC-NSF/BIO - Structural modeling of interactome to assess phenotypic effects of genetic variation
Amount £499,841 (GBP)
Funding ID BB/T010487/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2019 
End 08/2022
 
Description Biomedical Resource Development Fund
Amount £830,000 (GBP)
Funding ID WT104955MA 
Organisation Wellcome Trust 
Department Wellcome Trust Institutional Strategic Support Fund
Sector Charity/Non Profit
Country United Kingdom
Start 01/2015 
End 12/2020
 
Description EPSRC PhD Studentship
Amount £65,000 (GBP)
Funding ID EP/K502856/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2012 
End 03/2016
 
Description Modeling protein interactions to interpret genetic variation
Amount £458,127 (GBP)
Funding ID BB/P011705/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2016 
End 09/2019
 
Title SuSPect - A web server to predict the phenotypic effect of single amino acid variants 
Description SuSPect uses sequence-, structure- and systems biology-based features to predict the phenotypic effects of missense mutations. 77 features are used to train a support vector machine (SVM) to discriminate between disease-causing and neutral variants. In a blind test from VariBench, SuSPect achieved an AUC (area under ROC curve) of 0.89, balanced accuracy of 82% and a Matthews correlation coefficient of 0.65, a large improvement over other methods tested. 
Type Of Technology Webtool/Application 
Year Produced 2013 
Impact Application of prediction of genetic variants in proteins. 
URL http://www.sbg.bio.ic.ac.uk/~suspect/about.html
 
Description Human - computer interaction 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Demonstrated first playable protoype of docking game to general audience

To early to report
Year(s) Of Engagement Activity 2014
 
Description Imperial Festival & Fringe (open to public) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors.
Year(s) Of Engagement Activity 2014,2016
URL https://www.imperial.ac.uk/be-inspired/festival/
 
Description Lecture - Art and Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Talk highlighted link of structural biology and art.

Follow up invitation to talk at a human/computer iteraction conference
Year(s) Of Engagement Activity 2013
 
Description School lecture (London) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Talk to school children to spark interest in science

Requests for work experience
Year(s) Of Engagement Activity 2012
 
Description Talk at school 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Describing use of bioinformatics in medical research
Year(s) Of Engagement Activity 2015
 
Description Work experience for 16-18 year old pupils 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact We provided 1 weeks work experience for about 6 students each year. They visting facilities at Imperial and we introduced to computer programming and molecular graphics.
Year(s) Of Engagement Activity 2014,2015