Enhancing the Phyre2 protein modelling portal for the community

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Proteins are large molecules that are the machinery of life. They are long chains of different components and the order of these components is the amino-acid sequence. The genome projects are now determining the sequences of proteins from many species including plants, animals and microbes. Experimental methods can reveal the 3D structure of a protein, and this information is central to basic biological understanding and the exploitation of this biological knowledge has major implications for improvements in agriculture, animal welfare, health, and biotechnology. However, generally this essential information is not available from experiment. Biologists then require computational methods to predict this information.

The Sternberg group has developed a powerful and user-friendly resource for predicting the 3D structure of a protein from its sequence. The first version was 3D-PSSM and the more recent versions are Phyre and Phyre2. This is disseminated via a web server - a user pastes their protein sequence of interest into a box and the server returns details of the predicted 3D structure with atomic coordinates and additional information. This resource has proved exceptionally popular with the world-wide scientific community. There have been over 1.5 million sequence submissions to these servers and over 3,700 literature citations to the three main papers describing 3D-PSSM and Phyre. There are 60 papers citing the use of Phyre2 for their research which have been published in the highly-prestigious journals of Nature, Science and Proceedings of the National Academy of Sciences, USA.

Users of Phyre2 are supported by help pages, a video tutorial and e-mail help from us. Recently we contacted our users and we obtained over 1,000 letters of support demonstrating the value the community places on Phyre2. Based on our e-mail correspondence and information within these letters of support we identified a series of enhancements we wish to make to the Phyre2 web site. This grant will provide support for us to maintain, support and enhance the Phyre2 web server for use by the community. Specifically we propose:

1) To ensure the Phyre2 portal remains up to date and robust, provides a fast turnaround, and is supported by assistance to users.
2) To enhance user support by the development of extensive training material (with case studies), the provision of video help dynamically linked to every output web page and by contributing to national, European and world-wide activities in training.
3) To promote the widest use of Phyre2 by the bioscience community.
4) To disseminate the Phyre2 portal via publication, national and international conference presentations, and the provision of training workshops at different sites in the UK.
5) To enhance the modelling by ensuring Phyre2 incorporates the very latest modelling components which are available.
6) To enable large scale analyses of genomes including the provision of a fast processing facility using computing resources that can be purchased by any user (known as cloud computing).
7) To predict where on a protein certain biological features might occur and which regions of the protein may bind small molecules which are relevant to protein function.
8) To develop an exciting and novel approach, LogPhyre, to forge collaborations between Phyre2 users who are studying similar regions of protein space. LogPhyre will ensure user confidentiality and will, with user agreement, broker connections between users.
9) To participate in international comparative evaluations of protein structure prediction, known as CASP and CAMEO.
10) To work with our scientific management board to ensure we meet our objectives and are responsive the needs of our diverse user base.
11) To engage with diverse stakeholders including policy makers, museums, the general public and especially young people.

Technical Summary

Phyre2 is a widely-used web portal for protein structure prediction server. It is used by tens of thousands of users worldwide due to its power, functionality and a user interface that is informative and easy to use for biologists. A user pastes their protein sequence of interest into a box and the server returns details of the predicted 3D structure with atomic coordinates and additional information. Phyre2 and its predecessors have processed over 1.5 M sequences and have had over 3,700 citations. Recently we contacted our users and we obtained over 1,000 letters of support. Based on user feedback we identified enhancements we wish to make to Phyre2. Our aims are:
1) To ensure the Phyre2 portal remains up to date, robust and provides a fast turnaround.
2) To enhance user support by the development of extensive training material (with case studies), the provision of video help dynamically linked to every output web page and by contributing to national, European and world-wide activities in training.
3) To promote the widest use of Phyre2 by the bioscience community.
4) To disseminate the Phyre2 portal via publication, conference presentations, and the provision of training workshops.
5) To enhance the modelling by ensuring Phyre2 incorporates the very latest modelling components.
6) To introduce additional modelling features including template selection, model clustering, refinement using Rosetta, linking to the Cn3D viewer, and integrating with the popular Jalview sequence modelling tool.
7) To enable large scale analyses of genomes including the provision of a fast cloud batch procedure.
8) To predict post-translational modifications and ligand binding-sites.
9) To participate in the CASP and CAMEO international comparative evaluations of structure prediction.
10) To work with our management board to ensure we are responsive the needs of our users.
11) To engage with policy makers, museums, the general public and especially young people.

Planned Impact

This proposal is to maintain and enhance a web-based bioinformatics resource for the bioscience community to perform protein structure prediction using our Phyre2 system. Based on current demand and anticipated growth we envisage supporting at least 200,000 different users over the course of the five years. We will now identify those groups that will benefit from this research and in what way they will benefit.

Academic - Many of the users of the Phyre2 resource will be academic groups and the results of their use of Phyre2 will advance their research leading to economic and social benefit. Our current user-base is international and spans diverse researchers in bioscience who require information about protein structure and function. Feedback from users has shown that a Phyre2 prediction can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.

Public sector - Agencies involved in public health and food security are expected to continue to use the Phyre2 portal. For example the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.

Policy makers and the general public - Via open days at Imperial College, members of the general public will see demonstrations of Phyre2. This will highlight an area of research - bioinformatics - of which they may not have been aware. Furthermore this will demonstrate the collaborative nature of scientific research with its implications of value for money.
In particular the Imperial Festival is an annual event which attracted over 10,000 in 2013. From the policy side, Imperial invites to the Festival representatives from professional membership bodies, local and central government, higher education bodies including other university senior staff, and research funders. We will continue to give invited lectures to groups other than researchers. Previously, Prof Sternberg has addressed the Prince's Trust and a meeting at Brighton linking Art and Science.

Schools - In talks to schools by the PI, the Phyre2 server is described as a web-based resource for use by the community. This always has a major impact on the audience. Students are impressed by the Phyre2 usage figure - over 1.5 million hits - which is placed in the perspective of popular YouTube clips shown on TV programmes that often have fewer hits. The opportunity to develop a resource used by so many other scientists excites students as a highly worthwhile activity.
.

Publications

10 25 50
 
Description Genome projects are determining the sequences of proteins from many species including plants, animals and microbes. Experimental methods reveal the 3D structure of a protein, and this information is central to biological understanding and the exploitation of this knowledge has implications for improvements in agriculture, animal welfare, health, and biotechnology. But often this information is not available from experiment. Biologists then require computational methods to predict protein structure. Probably the most effective method to predict protein structure from sequence is to build the model based on the experimentally-determined structure of a known protein.
The Imperial group had developed a web-based server Phyre2 to undertake this task. The user pasted in a sequence and in a few hours the server returned a model. At the start of the grant (2015) there were about 45,000 distinct users per month of Phyre.
Under this grant, new functionality was added to Phyre2. A feature to scan a sequence against many genomes was developed. Substantial work was undertaken to enhance the modelling algorithm and a batch process developed. In 2020, there were over 100,000 distinct users. In recognition of the wide use of Phyre, since November 2016 it has been included in ELIXIR a pan-European network of resources for bioinformatics. As a testimony to the impact of Phyre2 on world-leading research we note that there were over 70 papers in Nature, Science, PNAS and Cell which cited our 2015 paper.
Exploitation Route The major beneficiaries who take this forward are bioscience researchers wishing to analyse the proteome of sequenced genomes. Ever more genomes will be sequenced and variations in sequences will need to be studied. Structural annotations will empower a wide range of biological studies by the broad biological community. Phyre has been applied to predicting function, target selection for structure determination, SAXS studies, EM reconstruction, determining domain organization, construct design, molecular replacement, understanding the phenotypic. The Phyre server can also be used for research directly resulting in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
Sectors Education,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://www.sbg.bio.ic.ac.uk/~phyre2
 
Description Phyre2, has been open to commercial users since October 2018. As of Jan 2020, there have been 1,616 commercial jobs corresponding to 627 unique commercial users. Feedback from users demonstrate that the Phyre server has been used for research directly results in economic and social benefit. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
First Year Of Impact 2015
Sector Agriculture, Food and Drink,Education,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description 18-BBSRC-NSF/BIO - Structural modeling of interactome to assess phenotypic effects of genetic variation
Amount £499,841 (GBP)
Funding ID BB/T010487/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2019 
End 08/2022
 
Description Biomedical Resource Development Fund
Amount £830,000 (GBP)
Funding ID WT104955MA 
Organisation Wellcome Trust 
Department Wellcome Trust Institutional Strategic Support Fund
Sector Charity/Non Profit
Country United Kingdom
Start 01/2015 
End 12/2020
 
Description Enhancement, dissemination and application of the PhyreRisk/Phyre resource for modelling protein structures and the effects of genetic variants
Amount £925,051 (GBP)
Funding ID 218242/Z/19/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 11/2019 
End 10/2024
 
Description Modeling protein interactions to interpret genetic variation
Amount £458,127 (GBP)
Funding ID BB/P011705/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2016 
End 09/2019
 
Title EzMol - a web-based wizard-driven program for the display of protein and nucleic acid structures 
Description A very simple-to-use web-based molecular visualisation tool developed by our group. It works via most browsers and is driven by a wizard interface. The program is web-based and works with most common browsers so there is no need for any installation or a licence. It is driven a wizard that directs the user through a focussed set of options with no need for any commands to be typed. It supports cartoon, stick and space-filling visualisation options for the protein chains, and the user selects residues for display, colouring and labelling from a presented list, similar to date selection tools on many web sites. The final visualisation model can be downloaded for publication or saved for subsequent use. The interface is particularly designed for the occasional user who does not want to remember command syntax or have to cope with a large number of menu options. Available at http://www.sbg.bio.ic.ac.uk/~ezmol/ 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact None yet as just launched 
URL http://www.sbg.bio.ic.ac.uk/~ezmol/
 
Title GWYRE 
Description GWYRE - The GWYRE resource provides a download of structures of protein complexes based on docking both experimental and Phyre predicted structures. The project is a collaboration of Vakser Lab, The University of Kansas and Sternberg Lab, Imperial College London 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact The community can acces this resource. 
 
Title GWYRE 
Description This web site provides predicted models for protein domains and binary complexes for the human proteome. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact We donot track users 
URL http://www.gwyre.org/
 
Title Missense3D 
Description Missense3D is a web-based algorithm to provide a structural interepreation for the effect of a missense variant in an experimental or predicted protein structure. Acees is open to all. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact Used in the CASP13 evaluation of predicting the effect of variants 
URL http://www.sbg.bio.ic.ac.uk/~missense3d/
 
Title Missense3D-DB 
Description Missense3D-DB is a database resource, which contains pre-computed atom-based calculations of the impact of amino acid substitution on protein structure obtained using the Missense3D algorithm. The current version of the database contains ~ 4 million missense variants from the following resources: Humsavar, ClinVar and gnomAD. Currently Missense3D-DB hosts variants prediction based on what we consider the best representative 3D coordinates for the query protein. Additional 3D coordinates representing the query protein in different conformational states or in complex with ligands or other proteins may be available. If you want to make predictions using different 3D coordinates please visit our variant prediction Missense3D software. Missense3D-DB is freely available to academic and commercial users. Missense3D-DB is freely available to the scientific communit 
Type Of Technology Webtool/Application 
Year Produced 2021 
Impact This has just been launched. A notable impact is that the DECIPHER database at the Sanger (https://decipher.sanger.ac.uk/) which is widely used by the clinical and biomedical communities to understand the impact of geneticvariants links directly to data provided by Missense3D-DB. 
 
Title Phyre2 - A portal for protein modelling 
Description Phyre2 is the seond generation of Phyre in which a user pastes a protein sequence and the server returns a predicted 3D structure and provides additional protein modelling. D 
Type Of Technology Webtool/Application 
Year Produced 2011 
Impact During 2013, Phyre2 had over 40,00 unique visitors and since 2012, over 80,000 distinct users. 
URL http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
 
Description Imperial Festival 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact We presented protein modelling. In 2016 and 2017 we also presented the protein docking game BioBlox.
Year(s) Of Engagement Activity 2014,2016,2017
 
Description Imperial Festival & Fringe (open to public) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors.
Year(s) Of Engagement Activity 2014,2016
URL https://www.imperial.ac.uk/be-inspired/festival/
 
Description New Scientist Live - stand presentation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Over 300 visitors to the stand saw protein modelling.
Year(s) Of Engagement Activity 2016,2017
URL https://live.newscientist.com/
 
Description Presentation of protein modelling resources at the Bett - An Educational show at Excel 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Prof Sternberg group together with members of Goldsmiths College exhibited at the Bett Show at the Excel Centre (23-26 January 2019) presenting molecular resources for educational use in schools. Two resources were presented i) EzMol: a web-based program to display protein and nucleic acid structures linked to a teaching portal and ii) BioBlox2D: a mobile game based on docking shapes linked to an A-level syllabus quiz. Imperial issued a press release about these resources.

As a result discussion have started with educational tools providers.
Year(s) Of Engagement Activity 2019
 
Description Work experience for 16-18 year old students in laboratory 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Experience of bioinformatics
Year(s) Of Engagement Activity 2016