18-BBSRC-NSF/BIO - Structural modeling of interactome to assess phenotypic effects of genetic variation

Lead Research Organisation: Imperial College London
Department Name: Life Sciences


This grant supports further collaboration between the groups of Prof Sternberg at Imperial College London and Prof Vakser at the University of Kansas, US.

Today the sequences of genomes can be determined rapidly and from the gene sequences one obtains the sequence of proteins, which are central to biological function. Accordingly, a grand challenge for biology is to maximize the fundamental biological insights that can be obtained from determination of these protein sequences. In addition, genetic mutation often leads to minor differences in the gene sequence which result in the change of a single part of the protein sequence. These are known as single amino acid variants (SAVs). A vast amount of information on SAVs is available from a wide-range of organisms ranging from human through to bacteria. SAVs can either result in altered biological activity or may have no discernible effect. Understanding and predicting the effect of SAVs is central to many areas of biological research. Knowledge of the three-dimensional structure of proteins and their interactions with partner proteins (known as complexes) is therefore essential for understanding the mode of action of proteins and the interpretation of the effects of genetic variation.

Under our previous collaboration the two groups developed the first version of a web-based resource called GWYRE (www.gwyre.org). This is a database of predicted 3D structures and complexes for biologically important organisms.

The aim of the proposed research is to enhance the development of the GWYRE resource. To do this we will:
(i) develop advanced methodology for high-throughput modelling of individual proteins
(ii) develop high-throughput structure-based methods to predict interactions of experimentally determined and modelled proteins
(iii) integrate (i) and (ii) to generate models for complexes
(iv) develop enhanced computer algorithms to assess the effect of SAVs on structures and complexes
(v) disseminate the GWYRE portal via publications, conference presentations, and the provision of training workshops
(vi) engage in public outreach.

Technical Summary

This project is a collaboration between the groups of Prof Sternberg at Imperial College London and Prof Vakser at the University of Kansas, US. We will continue development of an integrated approach to high-throughput modelling of proteins and their complexes and the mapping of SAVs as currently available in our version 1 GWYRE web resource (www.gwyre.org). Specifically we will:

*DEVELOP ADVANCED METHODOLOGY FOR HIGH-THROUGHPUT MODELLING OF INDIVIDUAL PROTEINS. The Phyre structure prediction server (> 90,000 users worldwide p.a.) predicts protein structure using sequence-based fold detection of remote homology. We will enhance Phyre by considering specific protein family templates for more accurate identification of optimum templates.

*DEVELOP HIGH-THROUGHPUT STRUCTURE-BASED METHODS TO PREDICT INTERACTIONS OF EXPERIMENTALLY DETERMINED AND MODELLED PROTEINS. We will further develop approaches for modelling the structures of protein-protein complexes, based on the similarity to experimentally determined protein-protein complexes (templates). Comprehensive benchmark sets of interacting and non-interacting modelled proteins will be generated based on the datasets of experimentally determined protein complexes.

*GENERATE A GENOME-WIDE DATABASE OF PROTEIN STRUCTURES AND PROTEIN-PROTEIN COMPLEXES FOR MODEL ORGANISMS. We will further develop our pipeline to integrate protein structure prediction with the prediction of protein-protein complexes and link servers in the two groups for use by the community. We will generate a database of structurally refined protein complexes for model eukaryotic and prokaryotic organisms. The results will be disseminated via the GWYRE database.

*ASSESS THE PHENOTYPIC EFFECTS OF GENETIC VARIATION. Our pipeline (Missense3D) will be further developed for users to map amino acid variants onto the structures and complexes in GWYRE and to use structure-based approaches to predict phenotypic effects.

Planned Impact

This project is a collaboration between Prof Sternberg's at Imperial College and Prof Vakser's group in Kansas (USA). We will continue with development of the GWYRE resource (www.gwyre.org) which integrates predicted tertiary structures (from Imperial) and complexes for model organisms (from Kansas). In addition a structure-based assessment of the effect of genetic variation will be established upon which genetic variants will be mapped and their phenotypic effect assessed (from Imperial). We will now identify those groups that will benefit from this research and in what way they will benefit.

COMMERCIAL USERS - The commercial users of this resources developed under this grant will span diverse researchers in bioscience who require information about protein structure, interactions, function and the effect of genetic variation. Feedback from users has shown that these predictions can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.

PUBLIC SECTOR - Agencies involved in public health and food security are expected to use the resources. For example, the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.

TRAINING - Phyre is widely used in undergraduate and postgraduate teaching and thus extending the training of the next generation of bioscientists in data driven biology.

POLICY MAKERS AND THE GENERAL PUBLIC - Via open days at Imperial College, members of the public will see demonstrations of protein modelling. This will highlight an area of research - bioinformatics - of which they may not have been aware. Furthermore, this will demonstrate the collaborative nature of scientific research with its implications of value for money. In particular the Imperial Festival is an annual event that attracted over 20,000 visitors in 2018. From the policy side, Imperial invites to the Festival representatives from professional membership bodies, local and central government, higher education bodies including other university senior staff, and research funders. We will continue to give invited lectures to groups other than researchers.

SCHOOLS - In talks to schools by the PI, the Phyre server is described as a web-based resource for use by the community. This always has a major impact on the audience. Students are impressed by the Phyre usage figure - over 3 million hits - which is placed in the perspective of popular YouTube clips shown on TV programmes that often have fewer hits.


10 25 50
Description Structural characterization of protein interactome1 is essential for interpretation of genetic variation. A vast amount of information on human genetic variation, including numerous missense variants ()i.e. single amino acid changes), is available from high-throughput sequencing. Despite significant progress in experimental techniques for protein structure determination, which fuels remarkable expansion of the Protein Data Bank (PDB) structures of most proteins must be determined by modeling. The number of protein-protein interactions (PPI) is significantly larger than the number of individual proteins. Moreover, structures of protein assemblies are more difficult to determine experimentally than that of the individual proteins, which makes the role of modeling in structural characterization of the interactome even more important.
Computational approaches to structure determination of individual proteins and protein-protein complexes have been rapidly progressing.
There are several databases that report human protein-protein interactions UniProt16 provides a single resource reporting human genetic variation combining data from 100K genomes. The interpretation of how these genetic variants impact protein interactions greatly benefits from structural models that can be examined and analyzed.

This project is a collaboration between the UK Imperial College Group and the US University of Kansas. Jointly we have developed the GWYRE resource, which integrates knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by GRAMM. The predictions are incorporated in a comprehensive web-based public resource for structural characterization of interactomes and mapping of missense variants obtained from UniProt. The resource, available at http://www.gwyre.org, facilitates better understanding of principles of protein interaction and structure/function relationships. Coordinates of complexes can be downloaded for inspection and further analysis.
Exploitation Route Knowledge of coordinates of a protein complex provides insight into designinmg novel pharmaceuticals or modification of proptyeins to alter function. The location of missense variants that are assocaited with diease can guide clinical studies to explain the genetic ba\sis of disease.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.gwyre.org
Description NSF/ BBSRC Bilateral partnership Imperial College London / University of Kansas USA 
Organisation University of Kansas
Country United States 
Sector Academic/University 
PI Contribution We provided data and advice about protein tertiary stricture prediction, the evaluation of missense variants and the design of user-friendly web sites.
Collaborator Contribution The partners focussed on develivering data and advice about predictiing protein/protein complexes and the design of the web site.
Impact The GWYRE protein resource
Start Year 2016
Title GWYRE (updated 2022) 
Description he GWYRE (Genome Wide PhYRE) project capitalizes on these developments by advancing and applying new powerful modeling methodologies to structural modeling of protein-protein interactions and genetic variation. The methods integrate knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by a full-structure alignment protocol to generate models for binary complexes. The predictions are incorporated in a comprehensive public resource for structural characterization of the human interactome and the location of human genetic variants. The GWYRE resource facilitates better understanding of principles of protein interaction and structure/function relationships. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact Users can obtain predicted protein complexes and can identify the location of missense variants. 
URL http://www.gwyre.org