18-BBSRC-NSF/BIO - Structural modeling of interactome to assess phenotypic effects of genetic variation

Lead Research Organisation: Imperial College London

Department Name: Life Sciences

Abstract

This grant supports further collaboration between the groups of Prof Sternberg at Imperial College London and Prof Vakser at the University of Kansas, US.

Today the sequences of genomes can be determined rapidly and from the gene sequences one obtains the sequence of proteins, which are central to biological function. Accordingly, a grand challenge for biology is to maximize the fundamental biological insights that can be obtained from determination of these protein sequences. In addition, genetic mutation often leads to minor differences in the gene sequence which result in the change of a single part of the protein sequence. These are known as single amino acid variants (SAVs). A vast amount of information on SAVs is available from a wide-range of organisms ranging from human through to bacteria. SAVs can either result in altered biological activity or may have no discernible effect. Understanding and predicting the effect of SAVs is central to many areas of biological research. Knowledge of the three-dimensional structure of proteins and their interactions with partner proteins (known as complexes) is therefore essential for understanding the mode of action of proteins and the interpretation of the effects of genetic variation.

Under our previous collaboration the two groups developed the first version of a web-based resource called GWYRE (www.gwyre.org). This is a database of predicted 3D structures and complexes for biologically important organisms.

The aim of the proposed research is to enhance the development of the GWYRE resource. To do this we will:
(i) develop advanced methodology for high-throughput modelling of individual proteins
(ii) develop high-throughput structure-based methods to predict interactions of experimentally determined and modelled proteins
(iii) integrate (i) and (ii) to generate models for complexes
(iv) develop enhanced computer algorithms to assess the effect of SAVs on structures and complexes
(v) disseminate the GWYRE portal via publications, conference presentations, and the provision of training workshops
(vi) engage in public outreach.

Technical Summary

This project is a collaboration between the groups of Prof Sternberg at Imperial College London and Prof Vakser at the University of Kansas, US. We will continue development of an integrated approach to high-throughput modelling of proteins and their complexes and the mapping of SAVs as currently available in our version 1 GWYRE web resource (www.gwyre.org). Specifically we will:

*DEVELOP ADVANCED METHODOLOGY FOR HIGH-THROUGHPUT MODELLING OF INDIVIDUAL PROTEINS. The Phyre structure prediction server (> 90,000 users worldwide p.a.) predicts protein structure using sequence-based fold detection of remote homology. We will enhance Phyre by considering specific protein family templates for more accurate identification of optimum templates.

*DEVELOP HIGH-THROUGHPUT STRUCTURE-BASED METHODS TO PREDICT INTERACTIONS OF EXPERIMENTALLY DETERMINED AND MODELLED PROTEINS. We will further develop approaches for modelling the structures of protein-protein complexes, based on the similarity to experimentally determined protein-protein complexes (templates). Comprehensive benchmark sets of interacting and non-interacting modelled proteins will be generated based on the datasets of experimentally determined protein complexes.

*GENERATE A GENOME-WIDE DATABASE OF PROTEIN STRUCTURES AND PROTEIN-PROTEIN COMPLEXES FOR MODEL ORGANISMS. We will further develop our pipeline to integrate protein structure prediction with the prediction of protein-protein complexes and link servers in the two groups for use by the community. We will generate a database of structurally refined protein complexes for model eukaryotic and prokaryotic organisms. The results will be disseminated via the GWYRE database.

*ASSESS THE PHENOTYPIC EFFECTS OF GENETIC VARIATION. Our pipeline (Missense3D) will be further developed for users to map amino acid variants onto the structures and complexes in GWYRE and to use structure-based approaches to predict phenotypic effects.

Planned Impact

This project is a collaboration between Prof Sternberg's at Imperial College and Prof Vakser's group in Kansas (USA). We will continue with development of the GWYRE resource (www.gwyre.org) which integrates predicted tertiary structures (from Imperial) and complexes for model organisms (from Kansas). In addition a structure-based assessment of the effect of genetic variation will be established upon which genetic variants will be mapped and their phenotypic effect assessed (from Imperial). We will now identify those groups that will benefit from this research and in what way they will benefit.

COMMERCIAL USERS - The commercial users of this resources developed under this grant will span diverse researchers in bioscience who require information about protein structure, interactions, function and the effect of genetic variation. Feedback from users has shown that these predictions can have a transformative effect on their research moving their conceptualisation into detailed consideration of the molecule at the three-dimensional atomic level. There are numerous application areas. One major application area is the identification of novel targets for pharmaceutical intervention. Structure guides both the design of small molecules and bio-therapeutics, such as monoclonal antibodies. The consequence of the design of novel pharmaceuticals has clear health and commercial benefit. A second application area is the agricultural sector. Similar considerations apply to animal health as for the pharmaceutical industry. In addition, genome information can be helpful in selective breeding of crops and fruit. The biotechnology and bio-energy sectors can focus on the modification of biological pathways and information about the structure and function of genes can inform these studies.

PUBLIC SECTOR - Agencies involved in public health and food security are expected to use the resources. For example, the location of a mutation on the surface of a human, animal or plant pathogen could be mapped to provide insight into structure/function relationships. This will impact on health and well-being.

TRAINING - Phyre is widely used in undergraduate and postgraduate teaching and thus extending the training of the next generation of bioscientists in data driven biology.

POLICY MAKERS AND THE GENERAL PUBLIC - Via open days at Imperial College, members of the public will see demonstrations of protein modelling. This will highlight an area of research - bioinformatics - of which they may not have been aware. Furthermore, this will demonstrate the collaborative nature of scientific research with its implications of value for money. In particular the Imperial Festival is an annual event that attracted over 20,000 visitors in 2018. From the policy side, Imperial invites to the Festival representatives from professional membership bodies, local and central government, higher education bodies including other university senior staff, and research funders. We will continue to give invited lectures to groups other than researchers.

SCHOOLS - In talks to schools by the PI, the Phyre server is described as a web-based resource for use by the community. This always has a major impact on the audience. Students are impressed by the Phyre usage figure - over 3 million hits - which is placed in the perspective of popular YouTube clips shown on TV programmes that often have fewer hits.

Funded Value:

£499,841

Funded Period:

Mar 20 - Feb 23

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/T010487/1

Principal Investigator:

Michael Sternberg

Research Subject:

Biomolecules & biochemistry (48%)

Omic sciences & technologies (24%)

Tools, technologies & methods (24%)

Research Topic:

Bioinformatics (24%)

Functional genomics (24%)

Multiprotein complexes (24%)

Protein expression (24%)

Organisations

People	ORCID iD
Michael Sternberg (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Casadio R (2021) Computational Resources for Molecular Biology 2021 in Journal of Molecular Biology

Casadio R (2022) Computational Resources for Molecular Biology 2022. in Journal of molecular biology

Casadio Rita (2022) Computational Resources for Molecular Biology 2022 in JOURNAL OF MOLECULAR BIOLOGY

Casadio Rita (2021) Computational Resources for Molecular Biology 2021 in JOURNAL OF MOLECULAR BIOLOGY

David A (2023) Protein structure-based evaluation of missense variants: Resources, challenges and future directions. in Current opinion in structural biology

Hanna G (2024) Missense3D-TM: Predicting the Effect of Missense Variants in Helical Transmembrane Protein Regions Using 3D Protein Structures. in Journal of molecular biology

Khanna T (2021) Missense3D-DB web catalogue: an atom-based analysis and repository of 4M human protein-coding genetic variants in Human Genetics

Malladi S (2022) GWYRE: A Resource for Mapping Variants onto Experimental and Modeled Structures of Human Protein Complexes. in Journal of molecular biology

Mathews D (2023) Computational Resources for Molecular Biology 2023 in Journal of Molecular Biology

Pennica C (2023) Missense3D-PPI: A Web Resource to Predict the Impact of Missense Variants at Protein Interfaces Using 3D Structural Data. in Journal of molecular biology

Key Findings
Further Funding
Collaboration
Software and Technical Products


Description	Structural characterization of protein interactome1 is essential for interpretation of genetic variation. A vast amount of information on human genetic variation, including numerous missense variants ()i.e. single amino acid changes), is available from high-throughput sequencing. Despite significant progress in experimental techniques for protein structure determination, which fuels remarkable expansion of the Protein Data Bank (PDB) structures of most proteins must be determined by modeling. The number of protein-protein interactions (PPI) is significantly larger than the number of individual proteins. Moreover, structures of protein assemblies are more difficult to determine experimentally than that of the individual proteins, which makes the role of modeling in structural characterization of the interactome even more important. Computational approaches to structure determination of individual proteins and protein-protein complexes have been rapidly progressing. There are several databases that report human protein-protein interactions UniProt16 provides a single resource reporting human genetic variation combining data from 100K genomes. The interpretation of how these genetic variants impact protein interactions greatly benefits from structural models that can be examined and analyzed. This project is a collaboration between the UK Imperial College Group and the US University of Kansas. Jointly we have developed the GWYRE resource, which integrates knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by GRAMM. The predictions are incorporated in a comprehensive web-based public resource for structural characterization of interactomes and mapping of missense variants obtained from UniProt. The resource, available at http://www.gwyre.org, facilitates better understanding of principles of protein interaction and structure/function relationships. Coordinates of complexes can be downloaded for inspection and further analysis.
Exploitation Route	Knowledge of coordinates of a protein complex provides insight into designinmg novel pharmaceuticals or modification of proptyeins to alter function. The location of missense variants that are assocaited with diease can guide clinical studies to explain the genetic ba\sis of disease.
Sectors	Healthcare Pharmaceuticals and Medical Biotechnology
URL	http://www.gwyre.org


Description	21-BBSRC/NSF-BIO: Modeling of protein interactions to predict phenotypic effects of genetic mutations
Amount	£551,394 (GBP)
Funding ID	BB/X01830X/1
Organisation	Biotechnology and Biological Sciences Research Council (BBSRC)
Sector	Public
Country	United Kingdom
Start	02/2023
End	01/2026


Description	Enhancement, dissemination and application of the PhyreRisk/Phyre resource for modelling protein structures and the effects of genetic variants
Amount	£925,051 (GBP)
Funding ID	218242/Z/19/Z
Organisation	Wellcome Trust
Sector	Charity/Non Profit
Country	United Kingdom
Start	04/2021
End	10/2026


Description	Enhancing the Phyre protein modelling resource: prediction of ligand binding and the impact of missense variants
Amount	£499,727 (GBP)
Funding ID	BB/V018558/1
Organisation	Biotechnology and Biological Sciences Research Council (BBSRC)
Sector	Public
Country	United Kingdom
Start	09/2022
End	09/2025


Description	NSF/ BBSRC Bilateral partnership Imperial College London / University of Kansas USA
Organisation	University of Kansas
Country	United States
Sector	Academic/University
PI Contribution	We provided data and advice about protein tertiary stricture prediction, the evaluation of missense variants and the design of user-friendly web sites.
Collaborator Contribution	The partners focussed on develivering data and advice about predictiing protein/protein complexes and the design of the web site.
Impact	The GWYRE protein resource
Start Year	2016


Title	GWYRE (updated 2022)
Description	he GWYRE (Genome Wide PhYRE) project capitalizes on these developments by advancing and applying new powerful modeling methodologies to structural modeling of protein-protein interactions and genetic variation. The methods integrate knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by a full-structure alignment protocol to generate models for binary complexes. The predictions are incorporated in a comprehensive public resource for structural characterization of the human interactome and the location of human genetic variants. The GWYRE resource facilitates better understanding of principles of protein interaction and structure/function relationships.
Type Of Technology	Webtool/Application
Year Produced	2022
Impact	Users can obtain predicted protein complexes and can identify the location of missense variants.
URL	http://www.gwyre.org


Title	Missense3D-PPI
Description	A web server to predict the structural impact of amino-acid missence variants at protein / protein interfaces
Type Of Technology	Webtool/Application
Year Produced	2023
Impact	None to date
URL	http://missense3d.bc.ic.ac.uk/missense3d/indexppi.html


Title	Missense3D-TM
Description	A web server to predict the impact of missense amino-acid variants on transmembrane regions in protein structure
Type Of Technology	Webtool/Application
Year Produced	2023
Impact	Too early
URL	http://missense3d.bc.ic.ac.uk:8888/ms3dtm/