FunPDBe - Community driven enrichment of PDB data with structural and functional annotations

Lead Research Organisation: University College London
Department Name: Structural Molecular Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Macromolecular structure data provides valuable information for the wider biomedical user community as demonstrated by Nobel prizes awarded to 22 scientists between 1946 and 2016 for studies related to the field of structural biology. To achieve even greater impact the coordinate information available in the Protein Data Bank (PDB) has to be supplemented by information providing biological context and enriched by value-added annotations. The challenges in deriving the biological context from the limited annotations available in the PDB has led to the development of many specialist data resources and structure analysis tools that enrich annotations. When combined with the coordinate data from PDB, these provide mechanistic information on biological processes. The structural bioinformatics community in the UK has been at the forefront of implementing tools and developing data resources to enrich structural data. The FunPDBe project will establish an integrated and easily accessible resource of structural and functional annotations for data available in the PDB. The collaboration between the Protein Data Bank in Europe (PDBe) and world-leading structural bioinformatics data resources will promote interoperability, comparative analysis and exchange of structural and functional annotations through the implementation of common data standards and infrastructure and bringing together currently fragmented enhanced annotations in a central repository. The project will implement a uniform data access mechanism and re-usable web components for distribution and display of these functional and structural annotations. The easy access to structural data and enhanced annotations will support obtaining insights into the effects of genetic variations, development of new tools to aid synthetic biology, enhancement in valuable annotations to enrich information available for agriculturally important macromolecules and contribute to human health by aiding interpretation of nsSNPS.

Planned Impact

FunPDBe is likely to have an impact over a very wide range of applications in the bioscience and biomedical areas. The key aspect of FunPDBe is the enrichment of value to that already in the PDB in terms of function annotations and the description of the probable structural effects of sequence variants. Currently there are over 500 million downloads of the PDB and over 500K distinct users of PDBe. We therefore expect that there is already a large user-base who will benefit from FunPDBe.

There will be three routes by which this impact will be realised. The first is through direct use of the resources by the non-academic sector. The pharmaceutical sector makes extensive use of the PDB data in structure-based drug discovery, diagnostics and similar work. These industries usually have home-built pipelines for target identification and for analysis of large or small molecules that can potentially bind these targets, etc. It is anticipated that rich functional annotations (e.g., identifications of binding sites, effects of mutations) and predictions and the availability of a uniform data access mechanism will make data discovery easier and can lead to more efficient analysis pipelines. The structural and functional information will also facilitate the design of modified proteins with specific properties such as altered substrate specificity and enhanced enzyme efficiency, in the emerging area of synthetic biology.

With the rapid decrease in the cost of genome sequencing, vast information about genetic variation in humans and many other species is being obtained. FunPDBe will provide annotations that will assist in interpreting the effect of these variants, for example identifying mutations which are likely to disrupt the tertiary or the quaternary structure or disrupt protein function and hence be associated with human or animal disease. In particular, Genomic England is undertaking sequencing of 100K individuals to identify disease-associated variants and data from FunPDBe will be of enormous value in analyses of these data. There are more than 20 consortia of biomedical researchers, established as part of the Genomics England activity, researching a range of different cancers and rare diseases, who will therefore benefit from the integrated data in FunPDBe.

The second route to derive impact is via the integration of this information into other bioinformatics resources that are used by the sectors described. We will work with resources such as UniProt, InterPro and Ensembl to facilitate integration of enriched annotations in those resources.

The third major route for realising impact is via the increasing number of academic groups that make use of PDB information and will have access to the enhanced annotations in FunPDBe. Their research impacts across all areas of commercial and societal advancements. Thus, via the academic and industrial pathways, the FunPDBe project will contribute to advances in human health, food security, animal health and related areas.

The availability of functional and structural impacts data from more than 10 UK groups
from a single site, FunPDBe, will be very beneficial in ensuring that these data are easily accessed and contrasted. This in turn will ensure that the data has a much more significant impact.

Members of the wider society often find structural biology too specialised a field. The key aspect of FunPDBe is to place the individual results of structural biology studies in a wider biological context to help an interested individual to more readily appreciate the importance of the field. For example, FunPDBe will have information on the effects of mutations, some of which may lead to disease. Being able to more easily create a coherent story from health and disease to an effect mutations have on structures will be a useful tool in outreach to the public, for instance via science festivals targeting school aged children, their teachers and parents.

Publications

10 25 50
publication icon
Das S (2021) CATH functional families predict functional sites in proteins. in Bioinformatics (Oxford, England)

publication icon
PDBe-KB Consortium (2020) PDBe-KB: a community-driven resource for structural and functional annotations. in Nucleic acids research

 
Description Function annotation of proteins is incomplete without characterisation of their functional sites. Knowledge of such functionally important residues in proteins can guide targeted site-directed mutagenesis experiments, drug design and protein engineering. This project aims to establish an integrated and easily accessible resource of structural and functional annotations for macromolecular structure data in the Protein Data Bank (PDB) and develop the necessary processes and standards for achieving this. This project has been running for 19 months and during this time, three workshops were organised which brought together representatives of many partner resources of FunPDBe together. Various aspects of the project were discussed in the workshops including the timeline, infrastructure and standards for community-driven functional site annotations. The comparative analysis and exchange of the different structural and functional annotations generated by all collaborating research groups were performed and this resulted in making important decisions regarding common data standards and the infrastructure to collect these enhanced annotations.

This project has now been adopted as an activity by the 3D-BioInfo ELIXIR Community in Structural Bioinformatics. This will enable other European partners to submit predicted functional site data for integration in the PDBe knowledge base (PDBe-KB). In this context we will be applying for ELIXIR support from the 3D-BioInfo Community award to facilitate workshops to discuss ontologies and strategies for benchmarking.
Exploitation Route By bringing together functional annotations from collaborating partner resources in a single repository, the FunPDBe project will support the wider research community in its efforts to exploit macromolecular structure data to derive knowledge. The tools and data developed as a part of this project will be important for analysing impacts of variations of human proteins, in drug design and diagnostic development.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL https://www.ebi.ac.uk/pdbe/pdbe-kb/
 
Description During this project, three workshops were organised which brought together representatives of many partner resources of FunPDBe. In the workshops, the group developed a protocol to make it possible for all collaborating research groups to provide structural and functional annotations in a common way. This has led to further community discussions and decisions regarding common data standards, for example the 3D-Beacon project which provides a framework for groups around the world to share structural data and metadata. Discussions with members of the biological community made it clear that integrating functional annotations from a variety of individual resources in a single repository the biological community provides benefits to research. Our team has generated a dataset of conserved functional sites based on CATH Functional Families (FunFams). This data was deposited to the FunPDBe resource via their web API. The tools and data developed as a part of this project will be important for analysing impacts of variations of human proteins, in drug design and diagnostic development.
First Year Of Impact 2019
Sector Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title Pipeline to identify Functional Sites (FunSites) and add to FunPDBe 
Description FunPDBe is a project of the Protein Data Bank in Europe - Knowledge Base (PDBe-KB) with the goal to create an integrated and accessible resource of structural and functional annotations for macromolecular structure data in the Protein Data Bank (PDB). It is a collaboration between the PDBe-KB and world-leading providers of structural bioinformatics data. This pipeline uses sequence and structural information from Functional Families (FunFams) in CATH to predict the location of functionally important sites within protein structures (FunSites). This information is then transformed into a JSON data structure (according to the FunPDBe schema) and exported to the PDBe-KB resource and made available on their web pages. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Making functional site predictions available to the wider community, in a way that allows results from different resources and algorithms to be compared and contrasted, is an important part of improving the accuracy of these predictions and developing how they can be applied further. Since the PDBe-KB provides a highly trusted and well-used web resource, this also helps to add visibility for the predicting resources themselves. 
URL https://www.ebi.ac.uk/pdbe/funpdbe/deposition/
 
Description ELIXIR 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We are part of the 3D-BioInfo ELIXIR Community in Structural Bioinformatics, which was established in January 2019 and is being coordinated by Christine Orengo. CATH-Gene3D contributes to two of the four major activities in 3D-BioInfo. Activity I relates to integration of functional sites in PDBe Knowledge Base (PDBe-KB). CATH Functional Families (FunFams) are being used to identify functional sites for domain families and this data is being integrated in PDBe-KB. Activity II relates to integration of tools and data associated with protein structure prediction. CATH functional families are being used to identify templates for homology modelling of structurally uncharacterised proteins. 3D-models have been generated for 14 model organisms including human, mouse, rat, arabadopsis, fly, yeast and E. Coli. 3D-Models are then integrated in the Genome3D resource, managed by Orengo. 3D-BioInfo Activity II involves integration of 3D-Models from Genome3D in PDBe-KB with links to UniProt. CATH-Gene3D recently received ELIXIR implementation study funding to collaborate with the SWISS-MODEL team in Switzerland to use the SWISS-MODEL pipeline together with template data from CATH functional families to build more accurate 3D models. We are planning to extend this activity to include more European partners through collaborations facilitated by 3D-BioInfo workshops. We are also part of a ELIXIR UK consortium of 17 research groups developing training material in structural bioinformatics. This work is being co-ordinated by the Genome3D consortium managed by Orengo. CATH-Gene3D training material was developed in 2013 for an ECCB workshop on protein structure to Function held in July 2013, organised by Christine Orengo, Nicholas Furnham and Romain Studer. This material has been adapted for the ELIXIR training workflows. Christine Orengo is also deputy lead of the Functional Effects Domain in Structural Bioinformatics which is integrating tools and resources from the 17 structural bioinformatics research groups mentioned above. The Domain is part of Genomics England and is headed by Ewan Birney. The aim is to establish an integrated resource will be used for the interpretation of genetic variations related to health and disease. Training material is also being developed in this context. ELIXIR UK funding was allocated in March 2017 to develop training workflows for predicting the impacts of genetic variations. These workflows have now been developed and are accessible via the ELIXIR TESS Training website.
Collaborator Contribution As regards the ELIXIR 3D-BioInfo collaborations, research groups from 15 European countries are involved in this collaboration. For the Activities that CATH-Gene3D contributes to, more than 10 groups are involved from 7 countries including the UK. All are contributing predicted functional site data to PDBe-KB. We all participate in workshops held at the EBI regularly to discuss ontologies and export/import mechanisms and APIs. As regards the ELIXIR UK training workflows, each group within the consortium is developing their own training material relating to their particular research area.
Impact All predicted functional site data will be made available via the PDBe-KB. Predicted domain data structure will be made available through Genome3D and also through PDBe-KB once the exchange mechanisms for that have been completed. All training material material will be integrated via on-line workflows which are being developed as a part of the TeSS platform - an on-line training catalogue and training facility being organised by the ELIXIR UK node.
Start Year 2013
 
Description FunPDBe - Community driven enrichment of PDB data with structural and functional annotations 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution My group have generated structural and functional annotations for more than 95 million protein domains from UniProt. This data, which is disseminated via CATH-Gene3D will also be exported to PDBe for selected model organisms. As part of this collaboration we are also developing training workflows for biologists wishing to access and extract this information from CATH and from FunPDBe. This work is being done in collaboration with five other UK research groups, who are also generating structural and functional annotations using diverse methods. By combining our annotations in PDBe we will increase the coverage of our annotations in the model organisms and the consensus information helps to provide a weighting on accuracy ie the more independent methods that agree on a prediction the more likely it is to be correct.
Collaborator Contribution This is a BBSRC funded project involving the PDBe group at EBI and 10 other research groups, which ha the aim of increasing the structural and functional annotations in PDBe and exploiting this data to investigate the impacts of genetic variation in proteins. There are 3 workpackages - 1) functional site data 2) curated functional information 3) prediction of variant impacts. Each group is contributing derived data or tools to support one or more of these 3 aims.
Impact The project has only been running 6 months. We have built the framework for exporting data from the partner groups to FunPDBe and for importing this data into FunPDBe. We have also built the framework for the training workflows and started populating the workflows with material on functional site annotations and homology modelling.
Start Year 2017
 
Description PDBe 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Our resource CATH provides high quality annotations to improve the quality of the information provided by the PDBe, primarily the location of structural domains and identifying distant evolutionary relationships between known protein structures. Our Gene3D resource provides structural annotations for genome sequences from ~20,000 species. These annotations are also incorporated in the Genome3D resource for selected model organisms. Collaborations between research groups involved in the Genome3D project has resulted in a high quality mapping between the CATH and SCOP structural classification databases. This is being implemented by the PDBe to improve the clarity and coverage of structural annotations in their resource. We currently have a BBSRC BBR funded collaboration with PDBe and InterPro to provide our CATH-Gene3D structural annotations to these resources, via the Genome3D portal.
Collaborator Contribution Host, maintain and curate the central PDBe resource and website.
Impact Publications Community resources to further scientific research.
Start Year 2006
 
Description Bioinformatics talk in UCL Healthcare Careers Day 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Around 100 school students attended the Bioinformatics STEM talk at the Medicine and Healthcare Careers Day at UCL which was aimed at introducing Bioinformatics to school students along with a hands-on practical session on structural bioinformatics and showcasing the CATH database.
Year(s) Of Engagement Activity 2017
 
Description Computational Biology conference in July 2017 (Prague, Czech Republic) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Intelligent Systems for Molecular Biology (ISMB) is an annual academic conference on the subjects of bioinformatics and computational biology organised by the International Society for Computational Biology (ISCB). In July 2017, ISMB/ECCB was held in Prague. The principal focus of the conference is on the development and application of advanced computational methods for biological problems. Talks and posters were presented during various sessions at this conference. Christine Orengo gave a talk on
on computational analyses exploiting CATH-Gene3D and Genome3D data.
Year(s) Of Engagement Activity 2017
URL https://www.iscb.org/ismbeccb2017
 
Description FunPDBe WP1 first workshop (EBI, Oct 2017) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The FunPDBe WP1 inaugural workshop was held at EMBL-EBI Hinxton on 5th & 6th October 2017. The workshop brought together representatives of many partner resources of FunPDBe together, where various aspects of the project were discussed including the timeline, infrastructure and standards for community-driven functional site annotations.
Year(s) Of Engagement Activity 2017
 
Description FunPDBe WP1 second workshop (UCL, Nov 2017) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The FunPDBe WP1 second workshop was held at UCL on 29th November, 2017. The workshop was attended by representatives of many partner resources of FunPDBe. There was an update on the FunPDBe infrastructure, the schema for functional site annotations was finalised and there was discussion on training and workflows.
Year(s) Of Engagement Activity 2017
 
Description Talk in Fisher Centre for Computational Biology meeting (8 Nov 2017) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact This was an invited talk at the 12th Fisher Centre meeting on 8 Nov. 2017 at The Francis Crick Institute. Around 100 participants attended the meeting. The talk sparked questions and discussions afterwards.
Year(s) Of Engagement Activity 2017
URL https://www.ucl.ac.uk/ra-fisher-centre