Supporting archival and dissemination of small-angle scattering data for atomistic structures in the PDB

Lead Research Organisation: European Bioinformatics Institute
Department Name: Protein Data Bank in Europe

Abstract

This proposal is concerned with preserving research results and experimental data in the field of structural biology, which aims to determine three-dimensional (3D) structures of important biological molecules, such as proteins and nucleic acids. Knowledge of these structures can aid areas such as discovery of new drugs, development of diagnostics of diseases, understanding the biology of health, disease and ageing, and optimisation of industrial processes through engineering of better enzymes. Structural data is deposited into a single global archive (the Protein Data Bank or PDB) by researchers from academic, government and industrial laboratories from all continents except Antarctica. This data is then annotated and made publicly and freely available. After starting with only 7 protein structures in 1971, the PDB experienced an almost exponential growth and in May 2014 passed the 100,000-structure milestone. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB) consortium, with partners in the UK, USA and Japan.

X-ray crystallography and Nuclear Magnetic Resonance (NMR) are the main two experimental methods in structural biology, and they have contributed 99% of the structures in the PDB. There is wide consensus in the structural biology community that the archived structures of biological molecules should be accompanied by the experimental data that supports them. Not only does this enable other scientists to reproduce, verify or reinterpret the original findings, it also stimulates and facilitates the development of new methods for structure determination and validation. Since 2008, deposition of experimental data is mandatory for all structures determined by X-ray and NMR methods.

However, other techniques, usually in combination with X-ray or NMR, are increasingly used nowadays to investigate the structure of large, complex and thus challenging biological systems. A popular technique is electron microscopy (EM), and experimental EM data can be deposited in a separate archive (called EMDB). Another, increasingly popular technique is small-angle X-ray (or neutron) scattering (SAXS/SANS). It can provide information on the overall size and shape of the studied molecules. SANS experiments in addition can provide information on the relative positions of various molecules when they interact to form large complexes. However, the wwPDB does not currently have a mechanism to collect the underlying experimental SAXS/SANS data. The wwPDB partners have therefore convened a task force of experts in small-angle scattering techniques (SAS TF) to get advice on archiving of SAXS/SANS data. The SAS TF strongly recommended that this data be collected and disseminated by wwPDB if it was used in conjunction with or in support of another technique.

This proposal aims to implement this recommendation by developing additions to the wwPDB archival software, and then disseminating the collected SAXS/SANS data to the scientific community. This project will improve our description and understanding of structures of important biological molecules and complexes.

The Protein Data Bank in Europe (PDBe) is a founding member of wwPDB. PDBe is part of the European Bioinformatics Institute (EMBL-EBI), the UK-based outstation of the European Molecular Biology Laboratory (EMBL). PDBe has strong expertise in X-ray crystallography, NMR, cryo-electron microscopy and software development. PDBe will pursue the proposed project in close collaboration with its international wwPDB partners and in consultation with world-leading experts in the SAXS/SANS techniques.

Technical Summary

The Protein Data Bank (PDB) is the single global repository of three-dimensional (3D) structures of proteins, nucleic acids and their complexes. The PDB is managed by the Worldwide Protein Data Bank (wwPDB), an international consortium of four organisations, including the Protein Data Bank in Europe (PDBe) at EMBL-EBI in Cambridge. wwPDB is implementing a new deposition and annotation (D&A) system, to facilitate the deposition, curation and distribution of structures and experimental data resulting from X-ray crystallography, Nuclear Magnetic Resonance (NMR), and electron cryo-microscopy. wwPDB has convened various task forces (TFs), made up of community experts, to advise it on archival policy, validation, etc. The TF for small-angle scattering (SAS) strongly recommended that wwPDB should collect and disseminate experimental SAS data, if it was used in conjunction with or in support of other methods, such as X-ray or NMR.

We aim to implement this community recommendation through 3 specific objectives:

(1) Extend the D&A tool to allow deposition of experimental SAS data to the PDB. We will consult and work with SAS community experts, our wwPDB partners and other stakeholders and will update the sasCIF and mmCIF dictionaries to fully accommodate the SAS data. We will design, code, test and release the extended deposition software.

(2) Make the deposited SAS data publicly available. With our wwPDB partners, we will implement the format and mechanism through which this data will be publicly released in the PDB archive.

(3) Extend the PDBe website to present SAS data to both specialist and non-specialist users. We will load the SAS data into the PDBe Oracle database and make it available via our API. We will design, code and test web pages and tools to present the SAS data and derived parameters relating to the overall shape and size of the studied system and goodness of fit between the deposited model and the data.

Planned Impact

The overarching goal motivating this funding application is to enable the Protein Data Bank (PDB) - the single global, freely and publicly accessible archive of macromolecular structure data - to collect and disseminate experimental data and associated meta-data (e.g., experimental setup and sample information) from small-angle X-ray (or neutron) scattering (SAXS/SANS) techniques, thus making the overall PDB archive more accurate and more complete. For the first time, this will enable the SAXS/SANS data supporting PDB structures to be collected, annotated to a high standard, and archived in a consistent fashion. The goal of this will be achieved by including the curated SAXS/SANS in the public PDB archive, an objective shared by all the wwPDB partners. To facilitate programmatic access to the data, it will be incorporated into the PDBe API, which has been developed as part of the TRDF-funded CRESTANO project (BB/K016970/1). SAXS/SANS data and derived parameters will also be disseminated through the PDBe website.

SAXS data can provide information on the overall shape and size of macromolecules as well as their state (e.g., scattering curves obtained for intrinsically disordered proteins have distinct features immediately signalling the presence of disorder). SANS data can provide information about relative positions of macromolecules in larger complexes.

The users of the PDB, both academic and from other sectors, will naturally be the ones immediately benefiting from this project. The PDB archive now contains more than 100,000 structures and users worldwide download 30 million PDB entries every month via the wwPDB partner websites and FTP distributions. A significant fraction of this user base is employed in industrial laboratories with an interest in structural biology, in the pharmaceutical, diagnostic, agricultural and other sectors. The entire PDB user base will therefore for the first time have access to the SAXS/SANS data, and will be able to critically evaluate any conclusions drawn from the structures associated with it.

Secondary school and university teachers in life sciences will also benefit from more accurate and richer data in the PDB.

If the experience with other techniques (e.g., X-ray crystallography) is any guide, then, in a longer perspective, beyond the end of the project, the availability of experimental SAXS/SANS data is expected to lead to new developments in the field, consensus on appropriate validation criteria and ultimately higher confidence in the correctness and reliability of the structures and a realistic appraisal of their limitations.

Publications

10 25 50

publication icon
Armstrong DR (2020) PDBe: improved findability of macromolecular structure data in the PDB. in Nucleic acids research

publication icon
Berman HM (2016) The archiving and dissemination of biological structure data. in Current opinion in structural biology

publication icon
Velankar S (2021) The Protein Data Bank Archive. in Methods in molecular biology (Clifton, N.J.)

publication icon
Young JY (2018) Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data. in Database : the journal of biological databases and curation

 
Description The wwPDB deposition system supporting the PDB archive has been enhanced to allow simultaneous depositions of structures to the PDB and small angle scattering data to SASBDB archives. The data thus deposited is linked and becomes more discoverable. PDBe website has been enhanced to show this linked data
Exploitation Route The linked deposition system will serve as a prototype for possible future interactions between the wwPDB and other specialist archives in structural biology.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology

URL https://deposit.wwpdb.org
 
Title Enhancement of OneDep system to support deposition of SAS data 
Description The OneDep system for validation, deposition and biocuration of PDB data was enhanced to support deposition of small-angle scattering (SAS) data to the Small-Angle Scattering Biological DataBank (SASBDB), when such data were used to solve the PDB structures and deposited through Onedep. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Impact Between Aug 2017, when the enhancement was first made avaialble, and the time of writing (Feb 2018) 10 PDB entries where SAS data was used to solve the structure or to support its interpreation have been deposited and released by the PDB and a further 19 are awaiting release pending the publication of the relevant peer-reviewed publications. 
URL http://deposit.wwpdb.org
 
Title Exposing linked SAS data via PDBe entry pages 
Description The PDBe website (PDB entry pages) was enhaced to display salient information pertaining to the small-angle scattering (SAS) data where it was used to solve the PDB structure and is deposited in SASBDB. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact The SASBDB API is used to deliver this information to PDBe pages. The information is wrapped in re-usable webcomponents and can be easily included by other resources. 
URL http://pdbe.org
 
Description Invited lecture at Pasteur Institute, Paris, July 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited lecture at the Integrative Structural Biology advanced course in 2017 and in 2018. The purpose was to present wwPDB and PDBe work on supporting integrative/hybrid methods for data deposition and data access. I also presented the work done on validation of structures and in 2018 I conducted a short tutorial on NMR validation. The lecture and tutorial were well received, and our team already received an invitation to participate in the upcoming course in 2019 with a somewhat more generous time allocation for our contribution.
Year(s) Of Engagement Activity 2017,2018
URL https://www.pasteur.fr/en/integrative-structural-biology
 
Description Invited speaker talk on Structural Biology Data Archiving at FEBS Congress 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact "Structural biology data archiving - where are we and what lies ahead?", invited lecture, FEBS Congress, Prague, Czech Republic, July 2018.
Year(s) Of Engagement Activity 2018
 
Description Lecture on Introduction to Structural Biology Data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact A lecture on the Introduction to structural biology data, was presented as part of the Computational Structural Biology course at EMBL-EBI in Cambridge, UK.
Year(s) Of Engagement Activity 2017
 
Description Lecture on Structural Biology Data Archiving 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact "Structural biology data archiving - where are we and what lies ahead?", seminar, EMBL, Heidelberg, Germany, September 2018.
Year(s) Of Engagement Activity 2018
 
Description Lecture titled Structural Biology Data Archiving 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact A lecture titled Structural biology data archiving was presented as during a course on X-ray Methods in Structural Biology at the Cold Spring Habor Laboratory, Cold Spring Harbor, NY, USA.
Year(s) Of Engagement Activity 2017
 
Description Seminar in Uppsala 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Invited lecture and follow-up discussion with local scientists and students.
Year(s) Of Engagement Activity 2016
 
Description Seminar in Vienna 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Invited lecture on "The wonderful world of structure archiving - what's happening and what's next?" as well as discussions with various local scientists and students.
Year(s) Of Engagement Activity 2016
 
Description Structural bioinformatics training course 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Co-organiser of and lecturer in this practical course, interacting with students and fellow instructors.
Year(s) Of Engagement Activity 2016,2017,2018
URL http://www.ebi.ac.uk/training/events/2016/structural-bioinformatics-2016
 
Description Univ of Copenhagen PhD Day (keynote speaker) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Keynote speaker at the PhD Day of the Dept of Biology of the Univ of Copenhagen. Also interacted with students who presented posters as well as the PhD students who organised the PhD Day.
Year(s) Of Engagement Activity 2016
URL http://phdday.wixsite.com/2016
 
Description Update on the wwPDB collaboration 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presentation at the NMR DG meeting.
Year(s) Of Engagement Activity 2019