Enhancing the Phyre protein modelling resource: prediction of ligand binding and the impact of missense variants

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Proteins are the machinery of life. A protein is a long chain of components (amino acid residues) - its sequence. Genome projects are determining the sequences of proteins from many species including plants, animals and microbes. Experimental methods reveal the 3D structure of a protein, and this information is central to biological understanding and the exploitation of this knowledge has implications for improvements in agriculture, animal welfare, health, and biotechnology. But often this information is not available from experiment. Biologists then require computational methods to predict protein structure.
The Imperial group has developed a powerful and user-friendly resource for predicting the 3D structure of a protein from its sequence - Phyre. Phyre is disseminated by a web server - a user pastes a sequence of interest into a box and the server returns the predicted 3D structure. Phyre has proved exceptionally popular with over 4M sequence submissions and over 11,000 literature citations. Recently we obtained over 2,000 letters of demand, including over 150 from the UK. In 2018, we purchased a licence that enabled us to open Phyre for commercial use.

The genome of all organisms including humans are subject to small changes in the genetic code and some of these changes result in a change of one amino acid into another, known as a missense variant. Sometimes a missense variant can impair the biological function of the protein and this leads to disease in humans and animals, affect crops and alter the pathogenicity of bacteria and viruses. In addition, identification of missense variants can be used in fundamental research to unravel biological function. We have recently launched a resource Missense3D that provides a report of the effect of a missense variant on protein structure and is shown to yield comparable accuracy on both high-quality experimental structures and Phyre-predicted models. Often proteins bind a small molecule known as a ligand. Knowledge of where in the protein a ligand binds is of major benefit in terms of suggesting protein function, identifying where is a protein a drug might bind and, central to this application, the effect of a missense variant.

In this grant Phyre will be extended to predict ligand binding sites. The approach will be based on identifying a suitable model within experimentally determined structures (the PDB). A key advance over current programs that predict ligand binding sites is to use a resource ChEBI that links information in the database of protein sequences (UniProt) about the actual biological ligand to ligands reported in the PDB. Thus instead of returning a set of possible ligands in different sites, the true ligand will be identified. The structure of the ligand in the Phyre-predicted structure will be refined using a widely-used program for docking small molecules into proteins (AutoDock Vina). The next step of the grant is to use the predicted ligand binding site to enhance the prediction of the impact of a missense variant via a second generation program Missense3D-v2. The approach will consider if the missense variant is highly conserved in different sequences, which indicates functional or structural importance, together with whether the residue is close to the ligand binding site. The ligand binding site predictor and Missense3D-v2 will be integrated into Phyre both in the web server and in the batch mode.

In addition, in this grant we will continue to support users via an e-mail help desk and training workshops. The code will be disseminated on GitHub for the community to contribute to its development.

Technical Summary

PPhyre2 is a widely-used web portal for protein structure prediction with substantial additional functionality distinguishing it from similar resources. Phyre and its predecessors have had 11,000 citations. In 2019 there were 100,000 unique users. We have obtained 2,000 letters of demand, including 150 from the UK. In 2016, Phyre2 was included within Elixir. Since 2018, Phyre was also freely available to commercial users.

The grant will enhance Phyre by building upon two related resources from our group. 3DLigandsite identifies the location of a possible ligand starting from either an experimental or Phyre-predicted structure. The prediction inherits the location of ligand sites in homologous PDB structures. But 3DLigandsite and other related resources do not include the available knowledge in UniProt about the known biological ligand. The ChEBI resource enables one to link the biologically-defined ligand in UniProt to a PDB entry with the same (or a closely-related) ligand. The aim here is to employ this link to provide enhanced and specific ligand binding site predictions. This will aid bioscience users who are not structural experts and will facilitate batch processing of proteomes. The model for the protein/ligand interaction will be refined using the world-leading program AutoDock Vina.

The second resource is Missense3D which provides a structure-based interpretation of the stereochemical impact of a missense variant. Missense3D does not model protein / ligand interactions and this grant will build upon the ligand binding site prediction to include this feature in Missense3D v2. The approach will consider if the missense variant is highly conserved in different sequences, which indicates functional or structural importance, together with whether the residue is close to the ligand binding site. In addition, the use of AutoDock Vina to model the impact of a missense variants on ligand binding will be explored.

Publications

10 25 50

publication icon
Mathews D (2023) Computational Resources for Molecular Biology 2023 in Journal of Molecular Biology