Modeling protein interactions to interpret genetic variation

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

A grand challenge for biology is to maximize the fundamental insights from high-throughput sequencing, which has become rapid and inexpensive. Structural information on proteins and their interactions is essential for understanding the effects of genetic variation. A vast amount of information on single amino acid variants (SAV) will be available from eukaryotic and prokaryotic organisms. We will develop an integrated approach for large-scale prediction of protein structures and their association. A database of predicted structures and complexes for model organisms will be established upon which genetic variants will be mapped and their phenotypic effect assessed.

The Objectives of the proposed research are: (1) to develop high-throughput structure-based methods to predict interactions of experimentally determined and modeled proteins; (2) to develop advanced methodology for high-throughput modeling of individual proteins; (3) to generate genome-wide database of protein structures and protein-protein complexes for model organisms; and (4) to assess phenotypic effects of genetic variation. Approaches will be developed to discriminate non-interacting from interacting proteins, and to model the structures of protein-protein complexes, based on similarity to experimentally determined protein-protein complexes and on properties of the intermolecular energy landscape. A novel approach for fold detection will extend the number of proteins that can be modeled. A pipeline will be developed to integrate protein structure prediction with the prediction of protein-protein complexes, and use structure-based approaches to predict the effects of SAVs. This collaborative proposal combines highly complementary areas of expertise of the US team, on high-throughput modeling of protein-protein interactions, and the UK team, on protein structure prediction and SAV effects.

Technical Summary

his project is a collaboration between Prof Vakser's group in Kansas (USA) and Prof Sternberg's at Imperial College with the following research goals.

*DEVELOP HIGH-THROUGHPUT STRUCTURE-BASED METHODS TO PREDICT INTERACTIONS OF EXPERIMENTALLY DETERMINED AND MODELED PROTEINS. We will develop approaches for modeling the structures of protein-protein complexes, based on the similarity to experimentally determined protein-protein complexes (templates). Comprehensive benchmark sets of interacting and non-interacting modeled proteins will be generated based on the datasets of experimentally determined protein complexes.

*DEVELOP ADVANCED METHODOLOGY FOR HIGH-THROUGHPUT MODELING OF INDIVIDUAL PROTEINS. We will model the structures of predicted interacting proteins. The Phyre2 structure prediction server (> 70,000 users worldwide) implements the state-of-the-art approach to protein modeling using sequence-based fold detection of remote homology. Recently a major breakthrough occurred in prediction of residue contacts from sequence. We will develop this novel approach extending the number of proteins that can be predicted.

*GENERATE GENOME-WIDE DATABASE OF PROTEIN STRUCTURES AND PROTEIN-PROTEIN COMPLEXES FOR MODEL ORGANISMS. We will develop a pipeline to integrate protein structure prediction with the prediction of protein-protein complexes and link servers in the two groups for use by the community. We will generate a database of structurally refined protein complexes for model eukaryotic and prokaryotic organisms. The database will be made available to the research community.

*ASSESS PHENOTYPIC EFFECTS OF GENETIC VARIATION. A pipeline will be developed for users to map amino acid variants onto the structures and complexes and use structure based approaches to predict phenotypic effects. The predicted effect on structure will involve modeling acceptable conformations for the variant side-chain and evaluating the change in interactions.

Planned Impact

A grand challenge for biology is to maximise the fundamental insights from high-throughput sequencing which has become rapid and inexpensive. Structural information on proteins and their interactions is essential for obtaining insights on genetic variation. A vast amount of information on single amino acid variants (SAV) will be available from eukaryotic and prokaryotic organisms. We will develop an integrated approach for large-scale prediction of protein structures and their association. This project is a collaboration between Prof Vakser's group in Kansas (USA) and Prof Sternberg's at Imperial College. A database of predicted structures (from Phyre2 via Imperial) and complexes for model organisms (from GWIDD via Kansas) will be established upon which genetic variants will be mapped and their phenotypic effect assessed (from SuSPect2 via Imperial)). We will now identify those groups that will benefit from this research and in what way they will benefit.

ACADEMIC - Many of the users of the resources developed under this grant will be academic groups and the results of their use of these resources will advance their research leading to economic and social benefit. The current users of the present resources at Imperial and at Kansa are international and spans diverse researchers in bioscience who require information about protein structure, interactions, function and the effect of genetic variation. Feedback from users has shown that these predictions can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.

PUBLIC SECTIOR - Agencies involved in public health and food security are expected to use the resources. For example the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.

POLICY MAKERS AND THE PUBLIC - Via open days at Imperial College, members of the general public will see demonstrations of protein modelling. This will highlight an area of research - bioinformatics - of which they may not have been aware. In particular the Imperial Festival is an annual event which attracted over 15,000 in 2016. From the policy side, Imperial invites to the Festival representatives from professional membership bodies, local and central government, higher education bodies including other university senior staff, and research funders. We will continue to give invited lectures to groups other than researchers. Previously, Prof Sternberg has addressed the Prince's Trust and a meeting at Brighton linking Art and Science.

SCHOOLS - In talks to schools, the Phyre2 server is described as a web-based resource for use by the community. This always has a major impact on the audience. Students are impressed by the Phyre2 usage figure - over 1.5 million hits. The opportunity to develop a resource used by so many other scientists excites students as a highly worthwhile activity.
 
Description This project was a collaboration between the group at Imperial and a group in the University of Kansas to model the effect of missense variants on protein complexes via structural information. The Imperial group enhanced their algorithm to predict the 3D structure of a protein from its sequence. The project started with the human proteome.in from its sequence using the algorithm Phyre. Predicted 3D models were then sent to the group at Kansas who specialise in taking component structures and docking them together. They generated models for complexes. In parallel Imperial developed a resource Missense3D that provided a structure-based explanation of the effect of a missense variant. The algorithm was developed so it would be effective on both experimental and predicted3D structures.
Exploitation Route Models for the predicted structures and complexes can be used to guide numerous studies on human proteins. Fundamental insight can be obtained relating sequence, structure and function. This can guide experiments where a residue is altered and its effect on function identified. There are several applications in healthcare. Structures of proteins can be used as targets for computer-aided drug discovery. 3D structures can assist in the design of biopharmaceuticals. Importantly mapping missense variants can help decided this change is likely to affect the structure of the function and be disease associated. This is increasingly important as 100,000 of human genomes are sequenced and being associated with clinical conditions.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.gwyre.org/
 
Description Phyre2, whose development was also supported by this grant, has beenopen to commercial users since October 2018. As of Jan 2020, there have been 1,616 commercial jobs corresponding to 627 unique commercial users. We do not know the details of their enquiries but it is envisaged they could be for the following applications. There could be to derive fundamental insight relating sequence, structure and function. This can guide experiments where a residue is altered and its effect on function identified. There are several applications in healthcare. Structures of proteins can be used as targets for computer-aided drug discovery. 3D structures can assist in the design of biopharmaceuticals. Importantly mapping missense variants can help decided this change is likely to affect the structure of the function and be disease associated. This is increasingly important as 100,000 of human genomes are sequenced and being associated with clinical conditions.
First Year Of Impact 2018
Sector Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description 18-BBSRC-NSF/BIO - Structural modeling of interactome to assess phenotypic effects of genetic variation
Amount £499,841 (GBP)
Funding ID BB/T010487/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2019 
End 08/2022
 
Description Enhancement, dissemination and application of the PhyreRisk/Phyre resource for modelling protein structures and the effects of genetic variants
Amount £925,051 (GBP)
Funding ID 218242/Z/19/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 11/2019 
End 10/2024
 
Description Modeling protein interactions to interpret genetic variation
Amount £458,127 (GBP)
Funding ID BB/P011705/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2016 
End 09/2019
 
Description NSF/ BBSRC Bilateral partnership Imperial College London / University of Kansas USA 
Organisation University of Kansas
Country United States 
Sector Academic/University 
PI Contribution We provided data and advice about protein tertiary stricture prediction, the evaluation of missense variants and the design of user-friendly web sites.
Collaborator Contribution The partners focussed on develivering data and advice about predictiing protein/protein complexes and the design of the web site.
Impact The GWYRE protein resource
Start Year 2016
 
Title GWYRE 
Description GWYRE - The GWYRE resource provides a download of structures of protein complexes based on docking both experimental and Phyre predicted structures. The project is a collaboration of Vakser Lab, The University of Kansas and Sternberg Lab, Imperial College London 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact The community can acces this resource. 
 
Title GWYRE 
Description This web site provides predicted models for protein domains and binary complexes for the human proteome. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact We donot track users 
URL http://www.gwyre.org/
 
Title Missense3D 
Description Missense3D is a web-based algorithm to provide a structural interepreation for the effect of a missense variant in an experimental or predicted protein structure. Acees is open to all. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact Used in the CASP13 evaluation of predicting the effect of variants 
URL http://www.sbg.bio.ic.ac.uk/~missense3d/
 
Description Imperial Festival 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact We presented protein modelling. In 2016 and 2017 we also presented the protein docking game BioBlox.
Year(s) Of Engagement Activity 2014,2016,2017
 
Description Imperial Festival & Fringe (open to public) 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Imperial festival is attended by over 10,000 visitors ranging from policy makers, the general public including children of all ages. We demonstrated the implications of understanding protein structure. At our stand we had over 100 visitors.
Year(s) Of Engagement Activity 2014,2016
URL https://www.imperial.ac.uk/be-inspired/festival/
 
Description New Scientist Live - stand presentation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Over 300 visitors to the stand saw protein modelling.
Year(s) Of Engagement Activity 2016,2017
URL https://live.newscientist.com/