Structural data-mining of high-resolution 3DEM maps in EMDB

Lead Research Organisation: European Bioinformatics Institute
Department Name: Protein Data Bank in Europe

Abstract

Almost all biological processes in living organisms are carried out by biomacromolecules such as proteins and nucleic acids. The study of three-dimensional structures of these macromolecules is an essential step in understanding fundamental biological processes at atomic level. Availability of these structures also helps in understanding disease processes and facilitates designing new drugs to fight them. There are three main techniques to determine the three-dimensional structures of biomacromolecules: X-ray crystallography, Nuclear Magnetic Resonance spectroscopy (NMR) and single-particle cryo-electron microscopy (cryo-EM). Recent technical advances in cryo-EM have led to a large increase in the use of this technique, and it may well become the method of choice for structural biologist in academia and possibly the pharmaceutical industry. Moreover, large macromolecular complexes can only be studied using cryo-EM techniques. Unfortunately, methods for validation and reliability analysis of derived atomic models and maps are lagging behind. Our research proposal aims to tackle some of the aspects of the validation, namely what are the structural details that can be reliably interpreted in a given map. Apart from helping to understand structural details that can be derived from a given map the results of this research will also help to improve derived atomic models.

Technical Summary

The proposed research aims to develop methods and corresponding software for selection and validation of a set of reliable, high-resolution cryo-EM maps and corresponding atomic models, segment maps with the help of atomic coordinates and identify sections of the maps corresponding to small fragments. Segmented map fragments will be aligned and resolution-dependent classification will be carried out. Developed methods and derived data will be implemented in user-friendly software tools. The main methods to be used are advanced tools from modern statistics as well as from image processing and analysis. These methods include multivariate data analysis including multi-dimensional scaling, principal component analysis, alignment in rotational groups (special orthogonal groups in three-dimension - SO3 groups), procrustes analysis and classification techniques.
The objectives of the proposed project are:
1) To develop methods for selecting EMDB entries with fitted atomic models, segmenting the EM density using the fitted model as a guide and aligning, classifying and averaging the segmented densities to obtain 3D density-motif libraries.
2) To explore variations in motifs as a function of resolution and environment to better understand the information content of EM structures.
3) To exploit the motif libraries to develop validation metrics based on the comparison of atomic models and motif densities.
4) To develop a production process to periodically update motif libraries as new entries are added to EMDB, and to provide versioned, open, public access to the motif libraries.

Planned Impact

Academic impact
1) The developed resolution metrics based on the density motif libraries will help the EM community better assess the quality of EM structures they produce and download from the EMDB.
2) The developed resolution metrics will make it easier for the wider biological community to interpret EM structures.
3) The developed side-chain density libraries will lead to a better understanding of the density in EM maps and the variation with local environment. This will be important for efforts aimed at driving the field to higher resolutions, for developers of software for building atomic models into EM maps, and for physicists and chemists interested in improved understanding of electron - soft matter interaction at the atomic level.
4) The model-guided segmentation, alignment and classification software that will be developed in the project has a broader applicability beyond this project, for instance, for developing density-motif libraries at the domain level and in other 3D image processing applications.
5) Training of a scientific programmer (SP) at EBI and a post-doc (PDRA) at LMB.
6) Training of PhD students and post-docs in the structural biology community.
7) Imagery based on the density-motif libraries could be exploited in text books and resources aimed at a more general audience such as undergraduate students and bioinformaticians (e.g., Arthur Lesk's "Introduction to Protein Science: Architecture, Function and Genomics", 3rd Edition, Oxford University Press 2016 has been very instructive in conveying to a broader audience what X-ray density looks like at different resolutions).

Economic and societal impacts
1) Users from public, private, and third sector organisations will accrue the same benefits of the results of this project as academic users by virtue of the open access to software, motif libraries and publications, but further interactions will be needed to better understand specific issues relating to area-specific applications.
2) The colour-coded imagery of structures based on resolution metrics can be aesthetically pleasing and can be exploited for the presentation of structural data to a broader audience.

Publications

10 25 50

publication icon
Bárcena M (2021) Structural biology in the fight against COVID-19. in Nature structural & molecular biology

publication icon
Wang Z (2022) Validation analysis of EMDB entries. in Acta crystallographica. Section D, Structural biology

 
Description 1) We have developed methods and production processes for structural data mining of EMDB and have applied them to derive canonical 3D map motifs for amino acid residues.
2) We have developed methods to validate EM maps and fitted/built models using the canonical map motifs.
Exploitation Route We have used internal funding from EMBL-EBI to continue the project and achieve the full outcomes of the original project. This includes:
a) public distribution of software;
b) publication describing method developed;
c) running the method against EMDB entries and presenting the results on the EMDB validation analysis pages;
d) developing a stand-alone server using the canonical map motif libraries;
e) public distribution of the map-motif libraries.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description EMDB: dealing with the cryo-EM data deluge
Amount £1,318,839 (GBP)
Funding ID 212977/Z/18/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 06/2019 
End 09/2023
 
Description Collaboration on EM validation with Garib Murshudov, MRC-LMB 
Organisation Medical Research Council (MRC)
Department MRC Laboratory of Molecular Biology (LMB)
Country United Kingdom 
Sector Academic/University 
PI Contribution We are testing/evaluating and incorporating methods that our partner are developing on our behalf for validation. We have provided feedback about our partner's software, incl. bug reports, and also reported on the results obtained when running there software on a sizeable fraction of the EM structures in the PDB and EMDB.
Collaborator Contribution Our partner is developing and improving software that we use in our efforts to develop a novel validation method for the EM community.
Impact Several bug fixes in RefMac. Helpful discussions and guidance in terms of our approach to developing the motif libraries.
Start Year 2017
 
Description Collaboration with Dr Garib Murshudov, MRC-LMB 
Organisation Medical Research Council (MRC)
Department MRC Laboratory of Molecular Biology (LMB)
Country United Kingdom 
Sector Academic/University 
PI Contribution Dr Murshudov was a Co-I on the BBSRC TRDF grant but after that grant ended he continued to collaborate with us providing advice, software.
Collaborator Contribution Software. Scientific advice. Co-authored the paper describing the work.
Impact 3D-Strudel software package. Libraries of 3D map motifs mined from EMDB. Publication (currently as preprint).
Start Year 2018
 
Description Collaboration with Kevin Cowtan, University of York 
Organisation University of York
Country United Kingdom 
Sector Academic/University 
PI Contribution We are working with Prof Cowtan to investigate the use of 3DStrudel map-motif libraries for model building in cryo-EM maps. The plan is to integrate the libraries with Buccaneer, a model building package developed by Prof Cowtan and so far we have conducted a pilot study to test feasibility.
Collaborator Contribution Prof. Cowtan's lab will advice us on integration and make changes to software to facilitate integration.
Impact Pilot study that demonstrates the value of moving forward with the integration.
Start Year 2020
 
Title StrudelScore plugin 
Description ChimeraX plugin for visualising results of 3DStrudel validation. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact There have been 1772 downloads of the software and people are using the software for visualising the results of 3DStrudel validation. 
URL https://github.com/emdb-empiar/strudel_score
 
Title threed_strudel 
Description This is a pip installable Python package containing core functionality for maps and models manipulations required for map-motifs mining and motif based validation. The description on github has information on how to use it with several provided scripts. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Software is being used in EMDB validation reports as well as on the validation analysis EMDB pages. 
URL https://github.com/emdb-empiar/3dstrudel
 
Description """Structural data mining of high-resolution 3DEM maps in EMDB"", oral talk at Validation Grant developers meeting, Birkbeck University, London, UK, 2019 (by Andrei Istrate) " 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Gave an overview on the developed methods and tools for aminoacid motifs mining in EM maps and their application for validation.
Year(s) Of Engagement Activity 2019
 
Description "3D-STRUDEL - A new validation method for cryo-EM maps and models", oral talk, EMBL Structural Biology Retreat, Hamburg, Germany, 2020 (by Andrei Istrate) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Gave an overview on the developed methods and tools for aminoacid motifs mining in EM maps and their application for validation.
Year(s) Of Engagement Activity 2020
 
Description "EM Validation by wwPDB & EMDB", presentation at the EMBO Course "Cryo-Electron Microscopy and 3D Image Processing", EMBL Heidelberg, virtual, September 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact "EM Validation by wwPDB & EMDB", presentation at the EMBO Course "Cryo-Electron Microscopy and 3D Image Processing", EMBL Heidelberg, virtual, September 2020
Year(s) Of Engagement Activity 2020
 
Description "EM data archiving", lecture Gerard Kleywegt and lectures and practical Osman Salih in the EM EMBO Course, Birkbeck, London, UK, September 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Contributed to this course for young EM practitioners.
Year(s) Of Engagement Activity 2019
 
Description "EMDB and EMPIAR" by Ardan Patwardhan at EMBO Practical Course: Cryo-Electron Microscopy and 3D Image Processing (virtual), April 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact Course presentation on EMDB and EMPIAR including 3DStrudel
Year(s) Of Engagement Activity 2020
 
Description "EMDB and EMPIAR"; S2C2 CryoEM CCP-EM Modeling Workshop presentation, November 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop presentation on EMDB and EMPIAR where 3DStrudel was presented and generated interest and questions from the audience as well as potential collaborations.
Year(s) Of Engagement Activity 2020
 
Description "Structural data mining of high-resolution 3DEM maps in EMDB", oral talk at CCP4 WG2 meeting, Crick institute, London, UK, 2019 (by Andrei Istrate) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Gave an overview on the developed methods and tools for aminoacid motifs minig in EM maps and their application for validation and model bulding.
Year(s) Of Engagement Activity 2019
 
Description "Strudel - model-dependent map-feature validation and application to EM SARS-CoV-2 structures" by Andrei Istrate; symposium on Cryo-EM Validation in the Age of SARS-CoV-2 (virtual); November 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation of Strudel function and application.
Year(s) Of Engagement Activity 2020
URL https://www.ccpem.ac.uk/training/validation_symposium_2020/Cryo-EM_Validation_in_the_Age_of_SARS-CoV...
 
Description "Validation by EMDB and wwPDB", symposium on "Cryo-EM Validation in the Age of SARS-CoV-2: Methods, Tools and Applications", virtual, November 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact "Validation by EMDB and wwPDB", symposium on "Cryo-EM Validation in the Age of SARS-CoV-2: Methods, Tools and Applications", virtual, November 2020
Year(s) Of Engagement Activity 2020
 
Description "What's new at EMDB and EMPIAR?", lecture Gerard Kleywegt at the CCP-EM Spring Symposium, Nottingham, UK, April 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Gave an overview of new developments and plans concerning cryo-EM data archiving in PDB, EMDB and EMPIAR.
Year(s) Of Engagement Activity 2019
 
Description "What's new in EMDB and EMPIAR" by Ardan Patwardhan at the CCP-EM Spring Symposium (virtual), April 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation of new developments in EMDB and EMPIAR including uses cases of 3DStrudel applied to SARS-CoV-2
Year(s) Of Engagement Activity 2020
 
Description 2018 International Conference of the Korean Society for Structural Biology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation on EMDB and EMPIAR and discussions on collaborations with the South Korean community
Year(s) Of Engagement Activity 2018
 
Description 220821-29 EMBO Practical Course on Cryo-Electron Microscopy and 3D Image Processing, EMBL Heidelberg Talk on EMDB and EMPIAR by Ardan Patwardhan 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact 220821-29 EMBO Practical Course on Cryo-Electron Microscopy and 3D Image Processing, EMBL Heidelberg Talk on EMDB and EMPIAR by Ardan Patwardhan
Year(s) Of Engagement Activity 2022
 
Description 221213 MicroED course at UCLA Jack Turner (EMDB) gave a (remote) presentation about EM archiving in EMDB and EMPIAR 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact 221213 MicroED course at UCLA Jack Turner (EMDB) gave a (remote) presentation about EM archiving in EMDB and EMPIAR
Year(s) Of Engagement Activity 2022
 
Description 3DEM Gordon Research Conference, Rhode Island USA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster presentation on EMDB and EMPIAR activities
Year(s) Of Engagement Activity 2018
URL https://www.grc.org/three-dimensional-electron-microscopy-conference/2018/
 
Description CCP-EM Icknield Model Building Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Talk focusing on validation for cryo EM but also highlighting other resources for EMDB and EMPIAR
Year(s) Of Engagement Activity 2018
 
Description CCP-EM Spring Symposium, Keele University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Talk and poster presentations on various aspects of the activities and plans relating to EMDB and EMPIAR
Year(s) Of Engagement Activity 2018
URL http://www.ccpem.ac.uk/training/spring_symposium_2018/spring_symposium.php
 
Description Cryo EM and 3D Image Processing (CEM3DIP) course New Delhi 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation on EMDB and EMPIAR to make audience aware of activities and resources available, plans for the future and how they could enhance their research by interacting with us.
Year(s) Of Engagement Activity 2018
URL http://meetings.embo.org/event/18-cem3dip
 
Description G.J. Kleywegt, "Community recommendations on validating cryo-EM models and data", invited talk at IUCr XXV Congress (session "Validation of cryo-EM structures and maps"), Prague, CZ, virtual, August 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact "Community recommendations on validating cryo-EM models and data", invited talk at IUCr XXV Congress (session "Validation of cryo-EM structures and maps"), Prague, CZ, virtual, August 2021
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=V4jkU1THo8o
 
Description Invited talk at 2022 CCP-EM Spring Symposium 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact "What's new in public EM data archiving at EMBL-EBI?", invited talk at the CCP-EM Spring Symposium, Nottingham, UK, May 2022.
Year(s) Of Engagement Activity 2022
 
Description Kuo Symposium on Cryo EM 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Poster presentation anf participation in workshop
Year(s) Of Engagement Activity 2018
URL http://kuo2018.csp.escience.cn/dct/page/1
 
Description Workshop on Computational Methods in Bio-imaging Sciences, National University of Singapore, Institute of Mathematical Sciences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Introduction to EM and EM databases to a wider scientific post-grad audience
Year(s) Of Engagement Activity 2018
URL https://ims.nus.edu.sg/events/2017/data/wk3.php