18-BBSRC-NSF/BIO - Structural modeling of interactome to assess phenotypic effects of genetic variation
Lead Research Organisation:
Imperial College London
Department Name: Life Sciences
Abstract
This grant supports further collaboration between the groups of Prof Sternberg at Imperial College London and Prof Vakser at the University of Kansas, US.
Today the sequences of genomes can be determined rapidly and from the gene sequences one obtains the sequence of proteins, which are central to biological function. Accordingly, a grand challenge for biology is to maximize the fundamental biological insights that can be obtained from determination of these protein sequences. In addition, genetic mutation often leads to minor differences in the gene sequence which result in the change of a single part of the protein sequence. These are known as single amino acid variants (SAVs). A vast amount of information on SAVs is available from a wide-range of organisms ranging from human through to bacteria. SAVs can either result in altered biological activity or may have no discernible effect. Understanding and predicting the effect of SAVs is central to many areas of biological research. Knowledge of the three-dimensional structure of proteins and their interactions with partner proteins (known as complexes) is therefore essential for understanding the mode of action of proteins and the interpretation of the effects of genetic variation.
Under our previous collaboration the two groups developed the first version of a web-based resource called GWYRE (www.gwyre.org). This is a database of predicted 3D structures and complexes for biologically important organisms.
The aim of the proposed research is to enhance the development of the GWYRE resource. To do this we will:
(i) develop advanced methodology for high-throughput modelling of individual proteins
(ii) develop high-throughput structure-based methods to predict interactions of experimentally determined and modelled proteins
(iii) integrate (i) and (ii) to generate models for complexes
(iv) develop enhanced computer algorithms to assess the effect of SAVs on structures and complexes
(v) disseminate the GWYRE portal via publications, conference presentations, and the provision of training workshops
(vi) engage in public outreach.
Today the sequences of genomes can be determined rapidly and from the gene sequences one obtains the sequence of proteins, which are central to biological function. Accordingly, a grand challenge for biology is to maximize the fundamental biological insights that can be obtained from determination of these protein sequences. In addition, genetic mutation often leads to minor differences in the gene sequence which result in the change of a single part of the protein sequence. These are known as single amino acid variants (SAVs). A vast amount of information on SAVs is available from a wide-range of organisms ranging from human through to bacteria. SAVs can either result in altered biological activity or may have no discernible effect. Understanding and predicting the effect of SAVs is central to many areas of biological research. Knowledge of the three-dimensional structure of proteins and their interactions with partner proteins (known as complexes) is therefore essential for understanding the mode of action of proteins and the interpretation of the effects of genetic variation.
Under our previous collaboration the two groups developed the first version of a web-based resource called GWYRE (www.gwyre.org). This is a database of predicted 3D structures and complexes for biologically important organisms.
The aim of the proposed research is to enhance the development of the GWYRE resource. To do this we will:
(i) develop advanced methodology for high-throughput modelling of individual proteins
(ii) develop high-throughput structure-based methods to predict interactions of experimentally determined and modelled proteins
(iii) integrate (i) and (ii) to generate models for complexes
(iv) develop enhanced computer algorithms to assess the effect of SAVs on structures and complexes
(v) disseminate the GWYRE portal via publications, conference presentations, and the provision of training workshops
(vi) engage in public outreach.
Technical Summary
This project is a collaboration between the groups of Prof Sternberg at Imperial College London and Prof Vakser at the University of Kansas, US. We will continue development of an integrated approach to high-throughput modelling of proteins and their complexes and the mapping of SAVs as currently available in our version 1 GWYRE web resource (www.gwyre.org). Specifically we will:
*DEVELOP ADVANCED METHODOLOGY FOR HIGH-THROUGHPUT MODELLING OF INDIVIDUAL PROTEINS. The Phyre structure prediction server (> 90,000 users worldwide p.a.) predicts protein structure using sequence-based fold detection of remote homology. We will enhance Phyre by considering specific protein family templates for more accurate identification of optimum templates.
*DEVELOP HIGH-THROUGHPUT STRUCTURE-BASED METHODS TO PREDICT INTERACTIONS OF EXPERIMENTALLY DETERMINED AND MODELLED PROTEINS. We will further develop approaches for modelling the structures of protein-protein complexes, based on the similarity to experimentally determined protein-protein complexes (templates). Comprehensive benchmark sets of interacting and non-interacting modelled proteins will be generated based on the datasets of experimentally determined protein complexes.
*GENERATE A GENOME-WIDE DATABASE OF PROTEIN STRUCTURES AND PROTEIN-PROTEIN COMPLEXES FOR MODEL ORGANISMS. We will further develop our pipeline to integrate protein structure prediction with the prediction of protein-protein complexes and link servers in the two groups for use by the community. We will generate a database of structurally refined protein complexes for model eukaryotic and prokaryotic organisms. The results will be disseminated via the GWYRE database.
*ASSESS THE PHENOTYPIC EFFECTS OF GENETIC VARIATION. Our pipeline (Missense3D) will be further developed for users to map amino acid variants onto the structures and complexes in GWYRE and to use structure-based approaches to predict phenotypic effects.
*DEVELOP ADVANCED METHODOLOGY FOR HIGH-THROUGHPUT MODELLING OF INDIVIDUAL PROTEINS. The Phyre structure prediction server (> 90,000 users worldwide p.a.) predicts protein structure using sequence-based fold detection of remote homology. We will enhance Phyre by considering specific protein family templates for more accurate identification of optimum templates.
*DEVELOP HIGH-THROUGHPUT STRUCTURE-BASED METHODS TO PREDICT INTERACTIONS OF EXPERIMENTALLY DETERMINED AND MODELLED PROTEINS. We will further develop approaches for modelling the structures of protein-protein complexes, based on the similarity to experimentally determined protein-protein complexes (templates). Comprehensive benchmark sets of interacting and non-interacting modelled proteins will be generated based on the datasets of experimentally determined protein complexes.
*GENERATE A GENOME-WIDE DATABASE OF PROTEIN STRUCTURES AND PROTEIN-PROTEIN COMPLEXES FOR MODEL ORGANISMS. We will further develop our pipeline to integrate protein structure prediction with the prediction of protein-protein complexes and link servers in the two groups for use by the community. We will generate a database of structurally refined protein complexes for model eukaryotic and prokaryotic organisms. The results will be disseminated via the GWYRE database.
*ASSESS THE PHENOTYPIC EFFECTS OF GENETIC VARIATION. Our pipeline (Missense3D) will be further developed for users to map amino acid variants onto the structures and complexes in GWYRE and to use structure-based approaches to predict phenotypic effects.
Planned Impact
This project is a collaboration between Prof Sternberg's at Imperial College and Prof Vakser's group in Kansas (USA). We will continue with development of the GWYRE resource (www.gwyre.org) which integrates predicted tertiary structures (from Imperial) and complexes for model organisms (from Kansas). In addition a structure-based assessment of the effect of genetic variation will be established upon which genetic variants will be mapped and their phenotypic effect assessed (from Imperial). We will now identify those groups that will benefit from this research and in what way they will benefit.
COMMERCIAL USERS - The commercial users of this resources developed under this grant will span diverse researchers in bioscience who require information about protein structure, interactions, function and the effect of genetic variation. Feedback from users has shown that these predictions can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
PUBLIC SECTOR - Agencies involved in public health and food security are expected to use the resources. For example, the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.
TRAINING - Phyre is widely used in undergraduate and postgraduate teaching and thus extending the training of the next generation of bioscientists in data driven biology.
POLICY MAKERS AND THE GENERAL PUBLIC - Via open days at Imperial College, members of the public will see demonstrations of protein modelling. This will highlight an area of research - bioinformatics - of which they may not have been aware. Furthermore, this will demonstrate the collaborative nature of scientific research with its implications of value for money. In particular the Imperial Festival is an annual event that attracted over 20,000 visitors in 2018. From the policy side, Imperial invites to the Festival representatives from professional membership bodies, local and central government, higher education bodies including other university senior staff, and research funders. We will continue to give invited lectures to groups other than researchers.
SCHOOLS - In talks to schools by the PI, the Phyre server is described as a web-based resource for use by the community. This always has a major impact on the audience. Students are impressed by the Phyre usage figure - over 3 million hits - which is placed in the perspective of popular YouTube clips shown on TV programmes that often have fewer hits.
COMMERCIAL USERS - The commercial users of this resources developed under this grant will span diverse researchers in bioscience who require information about protein structure, interactions, function and the effect of genetic variation. Feedback from users has shown that these predictions can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.
PUBLIC SECTOR - Agencies involved in public health and food security are expected to use the resources. For example, the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.
TRAINING - Phyre is widely used in undergraduate and postgraduate teaching and thus extending the training of the next generation of bioscientists in data driven biology.
POLICY MAKERS AND THE GENERAL PUBLIC - Via open days at Imperial College, members of the public will see demonstrations of protein modelling. This will highlight an area of research - bioinformatics - of which they may not have been aware. Furthermore, this will demonstrate the collaborative nature of scientific research with its implications of value for money. In particular the Imperial Festival is an annual event that attracted over 20,000 visitors in 2018. From the policy side, Imperial invites to the Festival representatives from professional membership bodies, local and central government, higher education bodies including other university senior staff, and research funders. We will continue to give invited lectures to groups other than researchers.
SCHOOLS - In talks to schools by the PI, the Phyre server is described as a web-based resource for use by the community. This always has a major impact on the audience. Students are impressed by the Phyre usage figure - over 3 million hits - which is placed in the perspective of popular YouTube clips shown on TV programmes that often have fewer hits.
People |
ORCID iD |
Michael Sternberg (Principal Investigator) |
Publications
Casadio R
(2021)
Computational Resources for Molecular Biology 2021.
in Journal of molecular biology
Casadio R
(2022)
Computational Resources for Molecular Biology 2022.
in Journal of molecular biology
David A
(2023)
Protein structure-based evaluation of missense variants: Resources, challenges and future directions.
in Current opinion in structural biology
Khanna T
(2021)
Missense3D-DB web catalogue: an atom-based analysis and repository of 4M human protein-coding genetic variants.
in Human genetics
Malladi S
(2022)
GWYRE: A Resource for Mapping Variants onto Experimental and Modeled Structures of Human Protein Complexes.
in Journal of molecular biology
Mathews DH
(2023)
Computational Resources for Molecular Biology 2023.
in Journal of molecular biology
Pennica C
(2023)
Missense3D-PPI: A Web Resource to Predict the Impact of Missense Variants at Protein Interfaces Using 3D Structural Data.
in Journal of molecular biology
Description | Structural characterization of protein interactome1 is essential for interpretation of genetic variation. A vast amount of information on human genetic variation, including numerous missense variants ()i.e. single amino acid changes), is available from high-throughput sequencing. Despite significant progress in experimental techniques for protein structure determination, which fuels remarkable expansion of the Protein Data Bank (PDB) structures of most proteins must be determined by modeling. The number of protein-protein interactions (PPI) is significantly larger than the number of individual proteins. Moreover, structures of protein assemblies are more difficult to determine experimentally than that of the individual proteins, which makes the role of modeling in structural characterization of the interactome even more important. Computational approaches to structure determination of individual proteins and protein-protein complexes have been rapidly progressing. There are several databases that report human protein-protein interactions UniProt16 provides a single resource reporting human genetic variation combining data from 100K genomes. The interpretation of how these genetic variants impact protein interactions greatly benefits from structural models that can be examined and analyzed. This project is a collaboration between the UK Imperial College Group and the US University of Kansas. Jointly we have developed the GWYRE resource, which integrates knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by GRAMM. The predictions are incorporated in a comprehensive web-based public resource for structural characterization of interactomes and mapping of missense variants obtained from UniProt. The resource, available at http://www.gwyre.org, facilitates better understanding of principles of protein interaction and structure/function relationships. Coordinates of complexes can be downloaded for inspection and further analysis. |
Exploitation Route | Knowledge of coordinates of a protein complex provides insight into designinmg novel pharmaceuticals or modification of proptyeins to alter function. The location of missense variants that are assocaited with diease can guide clinical studies to explain the genetic ba\sis of disease. |
Sectors | Healthcare,Pharmaceuticals and Medical Biotechnology |
URL | http://www.gwyre.org |
Description | NSF/ BBSRC Bilateral partnership Imperial College London / University of Kansas USA |
Organisation | University of Kansas |
Country | United States |
Sector | Academic/University |
PI Contribution | We provided data and advice about protein tertiary stricture prediction, the evaluation of missense variants and the design of user-friendly web sites. |
Collaborator Contribution | The partners focussed on develivering data and advice about predictiing protein/protein complexes and the design of the web site. |
Impact | The GWYRE protein resource |
Start Year | 2016 |
Title | GWYRE (updated 2022) |
Description | he GWYRE (Genome Wide PhYRE) project capitalizes on these developments by advancing and applying new powerful modeling methodologies to structural modeling of protein-protein interactions and genetic variation. The methods integrate knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by a full-structure alignment protocol to generate models for binary complexes. The predictions are incorporated in a comprehensive public resource for structural characterization of the human interactome and the location of human genetic variants. The GWYRE resource facilitates better understanding of principles of protein interaction and structure/function relationships. |
Type Of Technology | Webtool/Application |
Year Produced | 2022 |
Impact | Users can obtain predicted protein complexes and can identify the location of missense variants. |
URL | http://www.gwyre.org |