Unified data resource for NMR spectral and PDB data via and enhanced deposition visualisation and validation autodep system development

Lead Research Organisation: European Bioinformatics Institute
Department Name: Protein Data Bank in Europe

Abstract

The goal of this proposal is to ensure that NMR investigators in Europe have access to a convenient local system of deposition of NMR structures and all associated data in a manner similar to that developed by the RCSB and BMRB in the United States. We will build on our well-established database systems, and we plan to use our infrastructure by building on top of this for NMR Spectral data deposition and retrieval. We have developed web based Java/XML deposition systems, AutoDep4 for the PDB and the EmDep for EMDB and from this base we will extend these systems to allow for the archival and routing of 3D structure data to both the PDB for coordinate information and to the BMRB for NMR spectral data. In addition as a member of the wwPDB the MSD group is committed to process a greater proportion of PDB submissions as part of the global partnership. NMR data contains a wealth of information about the structure and dynamics of biological macromolecules. It is, however, often difficult to extract this information in a meaningful way. For example, the chemical shift values of certain protein backbone atoms can be directly used to determine the secondary structure of a protein, but because chemical shift values are averaged between all the different conformations a molecule adopts the chemical shifts of more flexible backbone regions or side chain atoms are much harder to interpret. In addition, information about, for example, the width of NMR signals is seldom used in solution NMR. The current deposition system presents a number of major difficulties for both depositors and potential users of NMR spectral data. For most depositors, coordinates and spectral data are deposited semi-independently of one another in order to archive a set of experiments. In particular, the time-consuming process of entering metadata related to the experiment must be performed twice, through two different processes, each of which collects a different subset of the relevant experimental data. For potential users of coordinate data the linkage to NMR spectral data there is no simple display option to view such data over the web. We will also extend the functionality of our visualisation and analysis software for molecular structures, AstexViewer@MSD-EBI, for the display and analysis of NMR data in relation to 3D structure and chemical properties to give a fast, object-based access to complex analyses. By enhancing these existing tools, tuned to specific NMR applications, we will provide powerful applications for the NMR structural community. The MSD database contains NMR information that is directly linked to the structure coordinates of biological macromolecules and opens exciting prospects for being able to relate NMR data to the structures in a more meaningful way. In particular from a structure point of view, the restraints are crucial and to visualise the distribution of restraints in relation to structure will be an important deliverable in this proposal. We will carry out a 'large scale' structure related analysis of the spectral data extracted from the BMRB database combining this with specific MSD data on chemical entities in relation to protein chemical shift data. As more relaxation data, which is directly related to the dynamics of the molecule, becomes available it is crucial that large-scale analyses are performed that incorporates as much data as possible. There will be two major results from this grant. First we will have developed a unified deposition interface for the deposition of all data related to macromolecular structure determination by NMR for both PDB specific data and BMRB specific data. Second we will have developed an access portal for the visualisation and analysis of this data suitable for the NMR community to apply to a wide range of problems.

Technical Summary

This proposal has two inter-related aims: (i) Develop an integrated system for the deposition of spectral data and coordinates based on the CCPN data model for NMR, and combined studies, together with a relational database management system. The deposition system will issue both PDB and BMRB database accession codes and in collaboration with the wwPDB partners. (ii) Develop a common access portal to deliver both NMR data and model coordinates with easily accessible visualization tools, annotations, and validation criteria. The first task will be achieved by extending our existing systems. At the EBI deposition to the PDB and EMDB is via the Autodep4 deposition system. These deposition interfaces and servers are metadata driven systems, which prepare entries for loading into the MSD database. The systems are flexible, extendable and easy to maintain and manage. XML based dictionaries drive the deposition interface, which stores data in XML format. Having the data in XML format is extremely beneficial, as this is easy to parse and can be transformed into any other required format like PDB. To provide the NMR community with an access portal, especially for the visualisation of the data archived in the BMRB database, linked to the coordinate data held in the PDB, we will tune the MSD Java viewing system, AstexViewer@MSD-EBI. This system is the result of our work with Astex International to develop an interactive web based viewer with the functionality to display the results from database searches for the MSD services. A major emphasis of the viewer has been the presentation of information associated with the display of multiple superposed structures and sequences that result from a database query. The viewer is ideally suited to be used in the analysis of spectral data with both sequence and coordinates. We will develop modules that will allow macromolecular NMR data to be presented in a number of different views.

Publications

10 25 50
publication icon
Doreleijers JF (2009) The NMR restraints grid at BMRB for 5,266 protein and nucleic acid PDB entries. in Journal of biomolecular NMR

publication icon
Montelione GT (2013) Recommendations of the wwPDB NMR Validation Task Force. in Structure (London, England : 1993)

publication icon
Penkett CJ (2010) Straightforward and complete deposition of NMR data to the PDBe. in Journal of biomolecular NMR

publication icon
Velankar S (2011) The Protein Data Bank in Europe (PDBe): bringing structure to biology in Acta Crystallographica Section D Biological Crystallography

publication icon
Velankar S (2010) PDBe: Protein Data Bank in Europe. in Nucleic acids research

publication icon
Velankar S (2012) PDBe: Protein Data Bank in Europe. in Nucleic acids research

publication icon
Velankar S (2011) PDBe: Protein Data Bank in Europe. in Nucleic acids research

 
Description 1. Deposition of CCPN projects to PDBe and BMRB.

The CCPN Entry Completion Interface (ECI) was developed by NMR staff at PDBe. It is a desktop application, which allows the user to create a deposition entry, and select data from an existing CCPN project for deposition in the PDB and BMRB archives. The user is also able to add or modify other data required by these archives, such as author and publication details, referencing information for chemical shifts, sample conditions, etc. The resulting CCPN project is accepted by the AutoDep server at PDBe. It pre-fills an AutoDep session and the user then only needs to confirm that the details are correct or make additions and modifications as needed. ECI allows export of full NMR-STAR V3.1 files, which can be uploaded to the ADIT-NMR server for BMRB depositions. The back-end functionality of ECI underlies the NMR annotation scripts at PDBe. In particular, the export of full NMR-STAR files allows for a smooth transfer of NMR data from PDBe to BMRB and effectively allows the user to make a single coordinated deposition to both PDBe and BMRB. ECI is available as part of the CCPN distribution from http://www.ccpn.ac.uk. Extensive ECI and AutoDep help is available for depositors through the PDBe NMR pages at http://pdbe.org/nmr. CCPN projects will continue to be accepted for deposition in the common Deposition & Annotation software, which is currently being developed by the wwPDB partners.

2. Visualisation of NMR structures and associated data.
PDBe launched a prototype of Vivaldi (Visualisation and Validation Display; http://pdbe.org/vivaldi) in July 2011. Three types of information are accessible through the current working prototype: information about the homogeneity of the modelled NMR ensemble, validation of experimental NMR data and validation of the modelled conformations against empirical knowledge and databases. OLDERADO, a PDBe service that analyses structure ensembles, provides clustering information and domain boundaries. Vivaldi can visualise individual models, clusters and representative models of each cluster as well as the domain information. Vivaldi displays and analyses the following experimental data: chemical shifts, distance restraints and residual dipolar couplings, which can all be shown in the 3D viewer, in a graph and as tables. Chemical shift outliers are identified by the PDBe VASCO service. Vivaldi analyses experimental distance and residual dipolar coupling (RDC) restraints, which are obtained from the NMR Restraints Grid (NRG) at BMRB. Both satisfied and violated restraints can be visualised. Post-processing of RDCs by Vivaldi includes the calculation of the molecule's alignment tensor, generally not deposited together with the RDC data. Fitted and experimental RDCs are presented as bar and scatter plots. Finally, per-residue scores are extracted from the external NRG-CING service, including an overall quality score that colours residues red, orange or green. PROCHECK and WHATIF scores, which express the overall quality of a structure, can also be shown by Vivaldi. Per-residue scores can be displayed as a graph, in tables or interactively in the 3D viewer.
Exploitation Route The ECI software is designed to simplify the deposition of CCPN projects - the emerging standard in NMR data representation - into the PDB and BMRB archives. It ensures internal consistency of the deposited data and is thus able to improve the overall quality of the archives. In turn, reliable archives are of higher value to their end users. While the usage of the CCPN data model in the NMR community has not yet become widespread, PDBe has already released more than 25 CCPN-based entries.

The prototype Vivaldi service at PDBe (http://pdbe.org/vivaldi) is a simple to use visualisation tool, aimed at both expert and non-expert users of the PDB. It allows them to visually inspect NMR entries in the PDB archive and assess their quality. It provides important validation information, which can help users decide which structure is most suitable for their purposes (e.g., for drug development, molecular function studies, mutant design, etc.). Vivaldi is useful for both academic and industrial researchers (e.g., pharma) as well as a teaching aid in structural biology or NMR courses.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://pdbe.org/nmr
 
Description The outcomes of this research are primarily academic in nature. The first goal of this project (ECI software and updates to annotation process at PDBe) concerns provision of more complete and better annotated data to the PDB and BMRB archives, which are both publicly available and are extensively used (over 30 million monthly downloads worldwide) by academic and industrial users. While the usage of the CCPN software in the NMR community has not yet become widespread, PDBe has been receiving, annotating and releasing CCPN entries (more than 25) since 2011, this contributing to the improved quality of data in the PDB. The second goal of the project resulted in a prototype Vivaldi service at PDBe (http://pdbe.org/vivaldi), which is a simple to use visualisation tool, aimed at both expert and non-expert users of the PDB. It allows them to visually inspect NMR entries in the PDB archive and assess their quality. It provides important validation information, which can help users decide which structure is most suitable for their purposes (e.g., for drug development, molecular function studies, mutant design, etc.). Vivaldi is useful for both academic and industrial researchers (e.g., pharma) and has been used as a teaching aid in advanced structural biology and NMR courses since 2012.
First Year Of Impact 2011
Sector Education,Pharmaceuticals and Medical Biotechnology
 
Description BBSRC - responsive mode May 2011
Amount £307,760 (GBP)
Funding ID BB/J007471/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2012 
End 03/2015
 
Title Entry Completion Interface (ECI) 
Description The ECI software is part of the CCPN suite and is developed to prepare CCPN projects for deposition in the PDB and BMRB archives. It is used under the CCPN licence. 
Type Of Technology Software 
Year Produced 2010 
Open Source License? Yes  
Impact More than 25 depositions that were processed and released by PDBe. 
URL http://www.ebi.ac.uk/pdbe/nmr/deposition/eci.getting_started.html
 
Title Vivaldi - Visualisation and validation display 
Description Vivaldi is a PDBe webservice, which allows both expert and non-expert users to visualise NMR structures deposited in the PDB archive as well as available validation reports. 
Type Of Technology Webtool/Application 
Year Produced 2011 
Impact To date, the Vivaldi tool received 4 citations on peer review papers. Underlying infrastructure has been used to continue our group's work in the validation of NMR entries in the PDB. There are more than 250 unique visitors to the Vivaldi page each month. 
URL http://pdbe.org/vivaldi