FunPDBe - Community driven enrichment of PDB data with structural and functional annotations

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Macromolecular structure data provides valuable information for the wider biomedical user community as demonstrated by Nobel prizes awarded to 22 scientists between 1946 and 2016 for studies related to the field of structural biology. To achieve even greater impact the coordinate information available in the Protein Data Bank (PDB) has to be supplemented by information providing biological context and enriched by value-added annotations. The challenges in deriving the biological context from the limited annotations available in the PDB has led to the development of many specialist data resources and structure analysis tools that enrich annotations. When combined with the coordinate data from PDB, these provide mechanistic information on biological processes. The structural bioinformatics community in the UK has been at the forefront of implementing tools and developing data resources to enrich structural data. The FunPDBe project will establish an integrated and easily accessible resource of structural and functional annotations for data available in the PDB. The collaboration between the Protein Data Bank in Europe (PDBe) and world-leading structural bioinformatics data resources will promote interoperability, comparative analysis and exchange of structural and functional annotations through the implementation of common data standards and infrastructure and bringing together currently fragmented enhanced annotations in a central repository. The project will implement a uniform data access mechanism and re-usable web components for distribution and display of these functional and structural annotations. The easy access to structural data and enhanced annotations will support obtaining insights into the effects of genetic variations, development of new tools to aid synthetic biology, enhancement in valuable annotations to enrich information available for agriculturally important macromolecules and contribute to human health by aiding interpretation of nsSNPS.

Planned Impact

FunPDBe is likely to have an impact over a very wide range of applications in the bioscience and biomedical areas. The key aspect of FunPDBe is the enrichment of value to that already in the PDB in terms of function annotations and the description of the probable structural effects of sequence variants. Currently there are over 500 million downloads of the PDB and over 500K distinct users of PDBe. We therefore expect that there is already a large user-base who will benefit from FunPDBe.

There will be three routes by which this impact will be realised. The first is through direct use of the resources by the non-academic sector. The pharmaceutical sector makes extensive use of the PDB data in structure-based drug discovery, diagnostics and similar work. These industries usually have home-built pipelines for target identification and for analysis of large or small molecules that can potentially bind these targets, etc. It is anticipated that rich functional annotations (e.g., identifications of binding sites, effects of mutations) and predictions and the availability of a uniform data access mechanism will make data discovery easier and can lead to more efficient analysis pipelines. The structural and functional information will also facilitate the design of modified proteins with specific properties such as altered substrate specificity and enhanced enzyme efficiency, in the emerging area of synthetic biology.

With the rapid decrease in the cost of genome sequencing, vast information about genetic variation in humans and many other species is being obtained. FunPDBe will provide annotations that will assist in interpreting the effect of these variants, for example identifying mutations which are likely to disrupt the tertiary or the quaternary structure or disrupt protein function and hence be associated with human or animal disease. In particular, Genomic England is undertaking sequencing of 100K individuals to identify disease-associated variants and data from FunPDBe will be of enormous value in analyses of these data. There are more than 20 consortia of biomedical researchers, established as part of the Genomics England activity, researching a range of different cancers and rare diseases, who will therefore benefit from the integrated data in FunPDBe.

The second route to derive impact is via the integration of this information into other bioinformatics resources that are used by the sectors described. We will work with resources such as UniProt, InterPro and Ensembl to facilitate integration of enriched annotations in those resources.

The third major route for realising impact is via the increasing number of academic groups that make use of PDB information and will have access to the enhanced annotations in FunPDBe. Their research impacts across all areas of commercial and societal advancements. Thus, via the academic and industrial pathways, the FunPDBe project will contribute to advances in human health, food security, animal health and related areas.

The availability of functional and structural impacts data from more than 10 UK groups
from a single site, FunPDBe, will be very beneficial in ensuring that these data are easily accessed and contrasted. This in turn will ensure that the data has a much more significant impact.

Members of the wider society often find structural biology too specialised a field. The key aspect of FunPDBe is to place the individual results of structural biology studies in a wider biological context to help an interested individual to more readily appreciate the importance of the field. For example, FunPDBe will have information on the effects of mutations, some of which may lead to disease. Being able to more easily create a coherent story from health and disease to an effect mutations have on structures will be a useful tool in outreach to the public, for instance via science festivals targeting school aged children, their teachers and parents.
 
Description This is a collaborative project with the Protein Data Bank in Europe (PDBe) to enhance the information in their Knowledge Base (PDBe-KB) that collates functional annotations and predictions for structural data in the PDB. The PDBe-KB is a collaborative effort across many groups to input diverse bioinformatics data. Our grant supported the inclusion of modelling the effects of missense variants (i.e. a change of an amino acid residue due to a change in the DNA sequence) on protein structure. From the available data bases (ClinVar, HumSavar and gnomAD), we have developed a set of missense variants and associated PDB data entries together with the known effect of that variant - is it disease-associated or benign. This data set has been sent to the PBD-KB for distribution to other groups who are also going to provide their annotations. We have run our program Missense3D on this set of variants that provides a structure-based explanation (e.g. the loss of a type of bond) why the missense variant should affect the structure.

In addition, the technology has been used in house to provide a structure-based annotation of over 2M missense variants in a freely available database Missense3D-DB.
Exploitation Route The data will be accessible to the community via the PDBe-KB web resource. This information will be central to the interpretation of observed changes in protein sequence that are being identified from genome sequencing. There are several projects, such as that under the aegis of Genomics England, where DNA sequences of people with a medical condition are obtained. A major challenge is to decide if an observed missense variant is likely to disrupt the folding of the protein and hence be disease associated. This can assist in genetic counselling and prioritisation of medical screening. Identification of a variant associated with disease could lead to the development of biomarker kits. In addition, the information can identify novel target for drug discovery.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

URL https://www.ebi.ac.uk/pdbe/pdbe-kb
 
Description FunPDBe is part of PDBe - European site of the protein data bank. There are a vast number of users including numerous life science and pharma companies accessing the PDB and this site wil have assist their non academic research
First Year Of Impact 2001
Sector Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
 
Title Missense3D-DB 
Description Missense3D-DB is a database resource, which contains pre-computed atom-based calculations of the impact of amino acid substitution on protein structure obtained using the Missense3D algorithm. The current version of the database contains ~ 4 million missense variants from the following resources: Humsavar, ClinVar and gnomAD. Currently Missense3D-DB hosts variants prediction based on what we consider the best representative 3D coordinates for the query protein. Additional 3D coordinates representing the query protein in different conformational states or in complex with ligands or other proteins may be available. If you want to make predictions using different 3D coordinates please visit our variant prediction Missense3D software. Missense3D-DB is freely available to academic and commercial users. Missense3D-DB is freely available to the scientific communit 
Type Of Technology Webtool/Application 
Year Produced 2021 
Impact This has just been launched. A notable impact is that the DECIPHER database at the Sanger (https://decipher.sanger.ac.uk/) which is widely used by the clinical and biomedical communities to understand the impact of geneticvariants links directly to data provided by Missense3D-DB. 
 
Title PDBe-KB 
Description PDBe-KB provides additional structural and functional annotation to protein coordinates in the protein data bank in Europe (PDBe). 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Too early 
URL https://www.ebi.ac.uk/pdbe/pdbe-kb