Maintaining and extending PHYRE2 to deliver an internationally-recognised resource for protein model
Lead Research Organisation:
Imperial College London
Department Name: Life Sciences
Abstract
Proteins are large molecules that are the machinery of life. They are long chains of different components and the order of these components is the amino-acid sequence. The genome projects are now determining the sequences of proteins from many species including human, plants, animals and microbes. Experimental methods can reveal the 3D structure of a protein, and this information is central to basic biological understanding and the exploitation of this biological knowledge has major implications for improvements in agriculture, animal welfare, health, and biotechnology. However, generally this essential information is not available from experiment. Biologists then require computational methods to predict this information.
The Sternberg group has developed a powerful and user-friendly resource for predicting the 3D structure of a protein from its sequence. The first version was 3D-PSSM and the more recent version is known as Phyre. This is disseminated via a web server - a user pastes their protein sequence of interest into a box and the server returns details of the predicted 3D structure with atomic coordinates and additional information. This resource has proved highly popular with the community. There have been over 750,000 submissions and the current rate is 2,500 per week. There have been over 2,000 citations to the three main papers describing 3D-PSSM and Phyre.
However, genes and their protein products do not act in isolation. The rapidly growing field of Systems Biology aims to understand Biology at the level of complex systems of interactions, of which proteins are a central component. Many techniques have recently become available to predict the vital parts of a protein that confer its function and the regions of a protein that take part in interactions with other molecules in the cell. Modelling these regions permits a better understanding of the role of genetics in disease by elucidating their role in the basic biochemistry of the cell and the network of interactions in which they take part.
This grant will provide support for us to maintain, support the Phyre web server. We will provide e-mail user support together with extensive documentation. In addition, we will run three hands-on workshops and four road-shows across the UK for biologists interested in using the methodology. The work will be disseminated by publications in the scientific literature and presentations at national and international meetings.
The functionality of Phyre will be enhanced to support the following topics.
1) The prediction of the interacting partners of a protein in the cell to better elucidate function. Determining the interactions a protein makes with other proteins is critical for a researcher to elucidate the protein/gene's wider role in cellular processes and disease. It can aid researchers in building larger models of entire systems.
2) The modelling of the structure of multiple proteins in a complex. In addition to determining which proteins are interacting ((1) above), the specific nature of that interaction gives researchers a detailed insight into which parts of a protein are critical for the interaction. This can then guide hypotheses and experimental design.
3) To suggest the effects of mutations on the structure and function of the protein. Algorithms are available to predict whether a mutation in a protein is likely to alter its function in the cell and these advances will be incorporated into the server.
5) To provide enhanced visualisation, which is key when dealing with complex three-dimensional protein structure. We will substantially extend the user's ability to plot a variety of predicted features mapped onto 3D model predictions, in particular functionally important parts of the protein and regions where mutations are known to occur.
The Sternberg group has developed a powerful and user-friendly resource for predicting the 3D structure of a protein from its sequence. The first version was 3D-PSSM and the more recent version is known as Phyre. This is disseminated via a web server - a user pastes their protein sequence of interest into a box and the server returns details of the predicted 3D structure with atomic coordinates and additional information. This resource has proved highly popular with the community. There have been over 750,000 submissions and the current rate is 2,500 per week. There have been over 2,000 citations to the three main papers describing 3D-PSSM and Phyre.
However, genes and their protein products do not act in isolation. The rapidly growing field of Systems Biology aims to understand Biology at the level of complex systems of interactions, of which proteins are a central component. Many techniques have recently become available to predict the vital parts of a protein that confer its function and the regions of a protein that take part in interactions with other molecules in the cell. Modelling these regions permits a better understanding of the role of genetics in disease by elucidating their role in the basic biochemistry of the cell and the network of interactions in which they take part.
This grant will provide support for us to maintain, support the Phyre web server. We will provide e-mail user support together with extensive documentation. In addition, we will run three hands-on workshops and four road-shows across the UK for biologists interested in using the methodology. The work will be disseminated by publications in the scientific literature and presentations at national and international meetings.
The functionality of Phyre will be enhanced to support the following topics.
1) The prediction of the interacting partners of a protein in the cell to better elucidate function. Determining the interactions a protein makes with other proteins is critical for a researcher to elucidate the protein/gene's wider role in cellular processes and disease. It can aid researchers in building larger models of entire systems.
2) The modelling of the structure of multiple proteins in a complex. In addition to determining which proteins are interacting ((1) above), the specific nature of that interaction gives researchers a detailed insight into which parts of a protein are critical for the interaction. This can then guide hypotheses and experimental design.
3) To suggest the effects of mutations on the structure and function of the protein. Algorithms are available to predict whether a mutation in a protein is likely to alter its function in the cell and these advances will be incorporated into the server.
5) To provide enhanced visualisation, which is key when dealing with complex three-dimensional protein structure. We will substantially extend the user's ability to plot a variety of predicted features mapped onto 3D model predictions, in particular functionally important parts of the protein and regions where mutations are known to occur.
Technical Summary
Phyre, our protein structure prediction server, is used by hundreds of groups worldwide. The aim of this proposal is to maintain, support and extend Phyre to include the prediction of protein interactions, functional sites and multi-protein complexes. We propose to:
1) Hold 3 workshops and 4 road-shows around the UK, maintain the server, provide full user-support and integrate new developments in the field as they arise.
2) Take part in the internation blind trial of protein structure prediction (CASP).
3) Integrate freely available tools and databases for the prediction of protein interfaces (e.g. SCOPPI, ProtInDB). Couple interface prediction of two user proteins to the freely available iWrap interface threading algorithm. Add the ability to return a predicted Biological Unit (BU) for homo-oligomers using both the existing PDB biomolecule information and the complementary ProtBud database of BUs. Potential clashes will be resolved using the method described in (2).
4) Extend our Poing multiple template modelling tool to handle multiple protein chains. Using existing homologous complexes to derive distance constraints, we will be able to relieve clashes and utilise multiple homologous complex constraints simultaneously.
5) Predict interacting partners by combining the remote homology detection of Phyre with the STRING database of interacting proteins via their existing API. A confident template structure detected by Phyre will be used to detect close homologues of the template in STRING to provide candidate interaction partners for the user protein including template structures in complexes.
1) Hold 3 workshops and 4 road-shows around the UK, maintain the server, provide full user-support and integrate new developments in the field as they arise.
2) Take part in the internation blind trial of protein structure prediction (CASP).
3) Integrate freely available tools and databases for the prediction of protein interfaces (e.g. SCOPPI, ProtInDB). Couple interface prediction of two user proteins to the freely available iWrap interface threading algorithm. Add the ability to return a predicted Biological Unit (BU) for homo-oligomers using both the existing PDB biomolecule information and the complementary ProtBud database of BUs. Potential clashes will be resolved using the method described in (2).
4) Extend our Poing multiple template modelling tool to handle multiple protein chains. Using existing homologous complexes to derive distance constraints, we will be able to relieve clashes and utilise multiple homologous complex constraints simultaneously.
5) Predict interacting partners by combining the remote homology detection of Phyre with the STRING database of interacting proteins via their existing API. A confident template structure detected by Phyre will be used to detect close homologues of the template in STRING to provide candidate interaction partners for the user protein including template structures in complexes.
Planned Impact
This proposal is to maintain and enhance a web-based bioinformatics resource for the bioscience and biomedical communities to perform protein structure prediction using our Phyre system. Based on current demand and anticipated growth we envisage supporting over 20,000 users over the course of the three years. We will now identify those groups that will benefit from this research and in what way they will benefit.
Academic - Many of the users of the Phyre resource will be academic groups and the results of their use of Phyre will advance their research leading to economic and social benefit. Our current user-base is international. The current user-base spans a diverse section of researchers in bioscience and biomedicine requiring information about protein structure and function. Feedback from users has shown that a Phyre prediction can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The bio-energy sector can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
Public sector - Agencies involved in public health and food security are expected to continue to use the Phyre server. For example the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.
Schools - In talks to schools by the PI, the Phyre server is described as a web-based resource. The development of tools used by many other researchers illustrates the broad impact of science research to the students.
General public - Via open days at Imperial, members of the general public will see demonstrations of Phyre. This will highlight an area of research - bioinformatics- which they may not have been aware of. Furthermore this will highlight the collaborative nature of scientific research with its implications of value for money.
Academic - Many of the users of the Phyre resource will be academic groups and the results of their use of Phyre will advance their research leading to economic and social benefit. Our current user-base is international. The current user-base spans a diverse section of researchers in bioscience and biomedicine requiring information about protein structure and function. Feedback from users has shown that a Phyre prediction can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The bio-energy sector can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
Public sector - Agencies involved in public health and food security are expected to continue to use the Phyre server. For example the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.
Schools - In talks to schools by the PI, the Phyre server is described as a web-based resource. The development of tools used by many other researchers illustrates the broad impact of science research to the students.
General public - Via open days at Imperial, members of the general public will see demonstrations of Phyre. This will highlight an area of research - bioinformatics- which they may not have been aware of. Furthermore this will highlight the collaborative nature of scientific research with its implications of value for money.
Organisations
People |
ORCID iD |
Michael Sternberg (Principal Investigator) |
Publications
David A
(2012)
A new structural model of the acid-labile subunit: pathogenetic mechanisms of short stature-causing mutations.
in Journal of molecular endocrinology
Jiang Y
(2016)
An expanded evaluation of protein function prediction methods shows an improvement in accuracy.
in Genome biology
Kelley L
(2015)
The Phyre2 web portal for protein modeling, prediction and analysis
in Nature Protocols
Kelley LA
(2015)
Partial protein domains: evolutionary insights and bioinformatics challenges.
in Genome biology
Lewis TE
(2015)
Genome3D: exploiting structure to help users understand their sequences.
in Nucleic acids research
Lewis TE
(2013)
Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.
in Nucleic acids research
Macdonald JT
(2013)
Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling.
in PloS one
Mao C
(2013)
Functional assignment of Mycobacterium tuberculosis proteome revealed by genome-scale fold-recognition.
in Tuberculosis (Edinburgh, Scotland)
McGreig JE
(2022)
3DLigandSite: structure-based prediction of protein-ligand binding sites.
in Nucleic acids research
Mezulis S
(2016)
PhyreStorm: A Web Server for Fast Structural Searches Against the PDB.
in Journal of molecular biology
Description | Genome projects are determining the sequences of proteins from many species including plants, animals and microbes. Experimental methods reveal the 3D structure of a protein, and this information is central to biological understanding and the exploitation of this knowledge has implications for improvements in agriculture, animal welfare, health, and biotechnology. But often this information is not available from experiment. Biologists then require computational methods to predict protein structure. Probably the most effective method to predict protein structure from sequence is to build the model based on the experimentally-determined structure of a known protein. The Imperial group had developed a web-based server Phyre2 to undertake this task. The user pasted in a sequence and in a few hours the server returned a model. At the start of the grant (2012) there were about 25,000 distinct users per month of Phyre. Under this grant, extensive new functionality was added to Phyre2. A resource called PhyreAlarm enabled a user to request weekly reruns of their query against the updated database. A tool PhyreInvestigator enabled a user to explore the accuracy of the predicted structure in detail. The resource, Phyre2, was disseminated to the community. By the end of the grant use by the community markedly increased to about 45,000 distinct users per year Under subsequent funding Phyre2 has been developed to incorporate many additional features including enabling users to process many sequences in batch mode. In 2019, there were over 90,000 distinct users. In recognition of the wide use of Phyre, since November 2016 it has been included in ELIXIR a pan-European network of resources for bioinformatics. |
Exploitation Route | The major beneficiaries who take this forward are bioscience researchers wishing to analyse the proteome of sequenced genomes. Ever more genomes will be sequenced and variations in sequences will need to be studied. Structural annotations will empower a wide range of biological studies by the broad biological community. Phyre has been applied to predicting function, target selection for structure determination, SAXS studies, EM reconstruction, determining domain organization, construct design, molecular replacement, understanding the phenotypic. The Phyre server can also be used for research directly resulting in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies. |
Sectors | Agriculture, Food and Drink,Education,Healthcare,Pharmaceuticals and Medical Biotechnology |
URL | http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index |
Description | Feedback from users demonstrate that the Phyre server has been used for research directly results in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies. |
First Year Of Impact | 2012 |
Sector | Agriculture, Food and Drink,Education,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Description | 18-BBSRC-NSF/BIO - Structural modeling of interactome to assess phenotypic effects of genetic variation |
Amount | £499,841 (GBP) |
Funding ID | BB/T010487/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2020 |
End | 08/2023 |
Description | Biomedical Resource Development Fund |
Amount | £830,000 (GBP) |
Funding ID | WT104955MA |
Organisation | Wellcome Trust |
Department | Wellcome Trust Institutional Strategic Support Fund |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 01/2015 |
End | 12/2020 |
Description | EPSRC PhD Studentship |
Amount | £65,000 (GBP) |
Funding ID | EP/K502856/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 10/2012 |
End | 03/2016 |
Description | Modeling protein interactions to interpret genetic variation |
Amount | £458,127 (GBP) |
Funding ID | BB/P011705/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2016 |
End | 09/2019 |
Title | SuSPect - A web server to predict the phenotypic effect of single amino acid variants |
Description | SuSPect uses sequence-, structure- and systems biology-based features to predict the phenotypic effects of missense mutations. 77 features are used to train a support vector machine (SVM) to discriminate between disease-causing and neutral variants. In a blind test from VariBench, SuSPect achieved an AUC (area under ROC curve) of 0.89, balanced accuracy of 82% and a Matthews correlation coefficient of 0.65, a large improvement over other methods tested. |
Type Of Technology | Webtool/Application |
Year Produced | 2013 |
Impact | Application of prediction of genetic variants in proteins. |
URL | http://www.sbg.bio.ic.ac.uk/~suspect/about.html |
Description | Human - computer interaction |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Demonstrated first playable protoype of docking game to general audience To early to report |
Year(s) Of Engagement Activity | 2014 |
Description | Imperial Festival & Fringe (open to public) |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors. |
Year(s) Of Engagement Activity | 2014,2016 |
URL | https://www.imperial.ac.uk/be-inspired/festival/ |
Description | Lecture - Art and Science |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Talk highlighted link of structural biology and art. Follow up invitation to talk at a human/computer iteraction conference |
Year(s) Of Engagement Activity | 2013 |
Description | School lecture (London) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Talk to school children to spark interest in science Requests for work experience |
Year(s) Of Engagement Activity | 2012 |
Description | Talk at school |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Describing use of bioinformatics in medical research |
Year(s) Of Engagement Activity | 2015 |
Description | Work experience for 16-18 year old pupils |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | We provided 1 weeks work experience for about 6 students each year. They visting facilities at Imperial and we introduced to computer programming and molecular graphics. |
Year(s) Of Engagement Activity | 2014,2015 |