Validation of biomacromolecular structures determined by NMR spectroscopy and deposited in the Protein Data Bank

Lead Research Organisation: University of Leicester
Department Name: Biochemistry

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

NMR structures in the Protein Data Bank (PDB) contain errors that often go undetected. The errors originate from limited data quality and quantity and complex computational procedures, as well as inherent dynamics of the biomacromolecules. There is currently no extensive mandatory validation of NMR structural data before deposition to the PDB and BMRB, nor is there a complete understanding of the limitations of current validation servers (CING and PSVS).

The clear need for improved NMR-derived biomacromolecular structures forms the basis of the four objectives for this proposal:

1. To implement the recommendations of the wwPDB NMR Validation Task Force (VTF) into an integrated software pipeline. This work will build upon the X-ray validation pipeline currently under development at PDBe. The pipeline will allow for validation of structures prior to deposition to the PDB and BMRB and will be used by all wwPDB partners. In addition, the pipeline will be used to assess the quality of all NMR structures in the PDB and the results will be made freely available.
2. To critically assess the utility, scope and limitations of current NMR validation tools. This work will build on the results of Objective 1 as well as prior research in the Vuister group using the CING validation software.
3. To develop new algorithms, procedures and tools for validation of high-resolution NMR structures, addressing issues such as dynamics and sparse data.
4. To disseminate validation-related information as well as newly developed validation methods for use by the wider scientific community. This will include development of new visualisation services at PDBe to help expert and non-expert users assess the quality of any NMR structure in the PDB. It will also include development of new publicly-available validation tools at Leicester.

PDBe will be the scientific lead for work on Objectives 1 and 4, and Leicester for work on Objectives 2 and 3.

Planned Impact

Structure determination and the study of interactions using biomacromolecular NMR techniques is still a rapidly growing field, with many new applications being developed all the time. Both automation and integrated pipelines have gradually changed the structure determination process from a highly expert undertaking to a more routine tool for biological research, albeit one that still requires dedicated technical expertise.

The current proposal addresses an important aspect of the structure determination process: validation of the resulting models in relation to both prior knowledge (chemical, physical and biological) and specific experimental data collected on a sample containing the molecule(s) of interest. Careful validation often allows detection and remediation of potential problems commonly encountered for NMR-derived structures. As a result, more reliable NMR structures will be obtained. We expect that the present project will contribute significantly to (a) quality improvement of all biomacromolecular NMR structures to be deposited in the PDB in the future, and (b) awareness of the quality of all existing NMR structures in the PDB (13% of the archive). These structures will then form a better starting point for understanding their biology, for protein engineering, homology modeling and drug design. The results of the project will strengthen the UK scientific innovative capacity and fit the BBSRC research priority "Technology development for bioscience". The users will include researchers at academic and government institutions, industrial laboratories as well as students and teachers with an interest in structural biology.

The tools, information and resources produced in this project will become widely available, as both partners have extensive experience in the development of web-based services. It is the mission of PDBe to curate newly deposited structures and to provide structural data and advanced services and resources to the worldwide scientific and industrial community. Their web servers, part of the EBI data centres, process millions of requests every month. The NMR group at the University of Leicester has been at the forefront in the development of validation tools. Their dedicated iCing validation server will be expanded to accommodate the new tools. The server already performs over 1000 validation runs on a yearly basis with requests originating from all continents, except Antarctica, and demand is still increasing. User input will provide important feedback regarding the general applicability and usability of our tools.

Both applicants participate in many international collaborations. PDBe is a partner in the Worldwide PDB consortium and EMDataBank and has participated in many EU-funded projects. While previously located in Nijmegen, the NMR-validation research program now at the University of Leicester was and is a participant in many EU-funded NMR-oriented projects, such as WeNMR. The latter effort is aimed at the development of a virtual research community (VRC) that will strengthen biomacromolecular NMR as a tool in biological research. The proposed research will both benefit from and strengthen this EU effort, as the WeNMR VRC will be an excellent platform for the dissemination of some of the results of the project.

The post-doctoral researchers for whom support is requested here, will work in an international and scientifically excellent research environment. The project will require them to utilise and develop their technical, scientific and personal skills. It is expected that the interactions within the respective groups in Hinxton, Leicester and Nijmegen will also raise awareness of the project with other post-doctoral researchers and graduate students active in the structural biology field. In fact, these researchers comprise an important target group of users of the tools that will be developed in the project.
 
Description Knowledge and understanding of the 3D structure and dynamics of biomacromolecules is crucial in a many areas of scientific endeavour. Nuclear Magnetic Resonance (NMR) spectroscopy is one of the three major techniques for experimental determination of biomacromolecular structures and accounts for 13% of all entries in the Protein Data Bank (PDB), the single, global archive of biomacromolecular structure data. PDB is managed by the Worldwide Protein Data Bank consortium (wwPDB; wwpdb.org), consisting of RCSB and BMRB in the USA, PDBe in Europe/UK and PDBj in Japan. NMR structures are deposited and curated at RCSB, PDBe and PDBj, whereas BMRB parses, annotates and stores the experimental data. wwPDB has appointed an international NMR Validation Task Force (VTF) to recommend what validation criteria should be used to assess the quality of NMR structures when these are deposited into the archive. One of the PIs of this joint proposal is a member of the VTF, while the other is a wwPDB PI and Head of PDBe.
Recent surveys show that commonly used protocols for validation of NMR-derived structural ensembles are insufficient and do not always detect serious problems. As a consequence, too many NMR structures contain errors and they are often regarded as inferior compared to X-ray crystal structures. As the focus of structural biology is shifting from individual domains to molecular machines and complexes, this poses new challenges for validation.
The present project aims to substantially improve validation of NMR-derived biomacromolecular structures. To this end, we have defined four main objectives:
1. To implement the recommendations of the wwPDB NMR VTF in an integrated software pipeline.
2. To critically assess the utility, scope and limitations of current NMR validation tools.
3. To develop new algorithms, procedures and tools for validation of high-resolution NMR structures.
4. To disseminate validation-related information as well as newly developed validation methods for use by the wider scientific community.
The NMR VTF is expected to provide its recommendations at the end of 2011. Recommendations from the X-ray VTF are currently being implemented in a software pipeline at PDBe, and this will become an integral part of the new joint wwPDB Deposition and Annotation system coming on-line in 2012. Time-tested and VTF-recommended validation programs and routines have been shared by their authors. Some of this software will be re-used in the NMR validation pipeline and both pipelines will be used to validate all current and newly deposited structures in the PDB (objective 1).
Validation is much less developed for NMR than for X-ray and there are many unanswered fundamental questions, e.g., the NMR VTF is discussing whether a measure comparable to resolution can be defined for NMR structures. It is also unclear how structures based purely on chemical shifts or other sparse data and existing protein structures should be validated. We will use the results of an archive-wide quality analysis using current validation tools to assess the strengths and limitations of current quality statistics (objective 2). New validation approaches will be developed as needed (objective 3). It is expected that this work will inform the future work of the wwPDB NMR VTF.
Finally (objective 4), we will disseminate the validation information, tools and services produced at PDBe and the University of Leicester, as widely as possible, to both expert and non-expert user communities. The results of running the validation pipeline on all NMR entries in the PDB will be made available in machine-readable form. PDBe will also develop a visualisation and analysis resource to present validation information in the context of the structure to help users judge a structure's reliability. Newly developed software will be made available at Leicester and may become part of the wwPDB NMR validation pipeline in the future.
Exploitation Route The NMR-VTF has identified three phases in the development of their recommendations; each phase with increasing complexity and need for fundamental research to answer the questions posed. Phase one has concluded with the adaptation of the NEF 1.0 data exchange standard and the formal release of the official wwPDB validation pipeline in 2017. Phases two and three are expected to require another 3-5 years for development and full implementation, as current state-of-the-art knowledge is not yet capable to address all the needs as identified by the NMR-VTF.
Sectors Digital/Communication/Information Technologies (including Software)

Healthcare

Manufacturing

including Industrial Biotechology

Pharmaceuticals and Medical Biotechnology

URL http://www.wwpdb.org/validation/validation-reports
 
Description The iCing validation server provides for ~1000 assessments of biomolecular NMR structures annually. These runs are initiated by individual researchers across the globe, providing them with crucial information for their research projects. The wwPDB validation pipeline is now operational and formally approved, providing standardised reports on all entries.
First Year Of Impact 2013
Sector Digital/Communication/Information Technologies (including Software),Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Societal

Economic

 
Title CASD-NMR-2013 database 
Description Results of the analysis of the CASD-NMR-2013 effort available as an SQL database. Database is available upon request due to security concerns of UoL IT department. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact Used for further studies into automated NMR structure determination. The full CASD-NMR-2013 resource underpinned a number of further scientific papers unrelated to the initial effort. 
 
Title NRG-cing database 
Description NRG-cing repository of validated NMR structures 
Type Of Material Database/Collection of data 
Year Produced 2013 
Provided To Others? Yes  
Impact Resource for all scientists using NMR-derived biological structures in their research. 
URL http://nmr.le.ac.uk
 
Title CppNmr Analysis release 2.4 
Description CcpNmr Analysis version 2.4 with enhanced assignment tools, new restraint calibration tool, new summary tool and new CYANA integration tool. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Positive user feedback. 
URL http://www.ccpn.ac.uk/software/analysis
 
Title The NMR Exchange Format (NEF) 
Description The NMR Exchange Format (NEF) has been developed in a collaboration between the CCPN, the BioMagResBank, the RCSB, and the main developers of macromolecular NMR software (Peter Guntert (CYANA), Charles Schwieters (XPLOR-NIH), Michael Nilges (ARIA), Torsten Herrmann (UNIO), David Wishart, David Case (AMBER), Guy Montelione (AutoAssign, ASDP)). It covers sequence, chemical shifts, spectra, peak lists, and restraints. The format specification is controlled by consensus of the partners, and all developers have committed to supporting the format as an input/output exchange format. Version 1.0 of the format specification is now stable and fully supported by CCPN, and will be supported by the upcoming release of NMR-STAR (version 3.2.0.1). 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact For over 20 years, efforts to establish seamless NMR data exchange between different programs have failed, relying on conversion between a variety of formats instead with a concomitant risk of information loss or misinterpretation. Efforts to develop universal NMR data converters have been challenged because some formats omit information required by other formats, and full parsing of each software-specific format has proven to be impossible. The current situation hampers the proper archiving and use of biomolecular NMR data, and prevents the routine inclusion of NMR restraint validation in the wwPDB NMR validation pipeline. The new NMR exchange format was developed in close consultation and with support of developers of key software packages used for NMR structure determination and refinement, with the aim of attaining a unified approach to represent NMR restraints and associated data. Together, they agreed on and successfully implemented and tested an NMR data representation and devised a governance structure for its maintenance and further development. The authors of fourteen different packages already committed during the initial discussions, with new ones joining the efforts since. 
URL https://github.com/NMRExchangeFormat/NEF/
 
Title iCing version 1.0 
Description iCing webserver for NMR structure validation, release version 1.0; CCPN 2.4 compatible 
Type Of Technology Webtool/Application 
Year Produced 2014 
Impact Positive user feedback 
URL http://nmr.le.ac.uk
 
Description Open-Science day 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact For over three hours, our scientists welcomed students from places such as Groby, Leicester, Northampton, Corby, Wellingborough, Rothwell and Coalville to the Henry Wellcome Building. To showcase the work carried out in the Department there were a series of short talks delivered by leading researchers, hands-on activities, displays, competitions and tours of our research labs.
Over a hundred students from nine schools had the chance to hear about and see our research and talk to our scientists, while having some fun and enjoying the refreshments.

The team showcased the Ccpn program and provided life demonstrations at the NMR spectrometer.
Year(s) Of Engagement Activity 2016
URL http://www2.le.ac.uk/departments/molcellbiol/file-store/science-open-day-2016