Collaborative Computational Project for Electron cryo-Microscopy (CCP-EM): 2021 - 2026

Lead Research Organisation: Stfc - Laboratories
Department Name: Scientific Computing Department

Abstract

The behaviour of living systems ultimately comes down to the interactions of biological molecules inside cells, and understanding these is vital to numerous human efforts, including controlling disease and improving food production. While experimental techniques such as macromolecular crystallography have for many years given detailed information on important molecules in the cell, many classes of molecules are not amenable to this technique. Moreover, as our understanding of pathways in the cell grows, there is increasing interest in the context in which these molecules operate. In other words, where in the cell do these molecules do their job, and which other cellular components are necessary for their function. Electron cryo-Microscopy (cryoEM) provides very useful information here, and bridges the gap between individual molecules and the whole cell. In the most favourable cases, detailed images of assemblies of molecules can be obtained, while at lower resolutions electron tomograms can show internal molecular details from within intact parts of cells or tissues.

Advances in instrumentation and data processing led to a significant increase in the quality of cryoEM data, which was characterised in 2014 as the "Resolution Revolution", and recognised by the 2017 Nobel Prize in Chemistry. Consequently there has been a surge in interest in the technique from structural and cellular biologists trying to understand a wide range of biological systems. There has been significant investment in the research infrastructure supporting cryoEM, most notably the establishment of several electron microscope facilities around the country. In the last few years, pharmaceutical companies and biotechnology companies have recognised the importance of cryoEM to their discovery pipelines, and have also begun investing in the area. A key component of this research infrastructure is the computational support to manage the data, process the micrographs, and interpret the data in terms of molecular volumes and/or atomic structures. The Collaborative Computational Project for Electron cryo-Microscopy (CCP-EM) was established during the period 2012 - 2016 to provide this part of the research infrastructure.

The proposed project is intended to provide continued support to the cryoEM community. One of the major products of the CCP-EM partnership is a software suite for processing cryoEM data collected at microscope facilities. Individual computer programs in this suite are developed independently, either by members of CCP-EM or collaborators. The role of CCP-EM is to collate these programs into a single suite, develop workflows through the suite, and distribute the suite to practising scientists. When done well, this is a win-win arrangement in which scientists get access to a comprehensive set of software in one place, and methods developers get access to a large user base. It is well known, however, that software rapidly becomes unusable if not actively maintained and it is the responsibility of the core team of CCP-EM to ensure the longevity of software in the suite.

We will also expand the scope of the suite. We will improve the tools for validating the structural information obtained, and facilitate the deposition of data in international archives. We will help to drive FAIR principles - that data from cryoEM experiments are accessible and usable to the wider community. We will increase our support for sub-tomogram averaging, a particular technique for obtaining in situ structural information of molecules. Finally, we will make more use of machine learning i.e. advanced algorithms that can learn from the data.

All these advances will be tightly coupled with our on-going user training programme, and support for individual methods developers. We will also continue our very popular annual Spring Symposium, which now provides a forum for 300 researchers to share experiences and to develop the cryoEM community.

Technical Summary

The CCP-EM partnership exists to further strengthen and expand the cryo-EM community. Activities can be broadly divided into three areas: (1) development of the CCP-EM software suite, (2) training programme and community events, and (3) strategic initiatives and outreach. We request funding for a core team of computational scientists to coordinate and develop these activities. Since software development, training and outreach are closely linked, we expect the team to work together in all areas.

The software suite is designed to integrate individual programs from diverse sources into a convenient package for scientists. It provides a framework for collaborating with other developers and helping to make novel methods available to the community. Collaboration is required, and the suite is not intended to be a container for all available software. The core team is developing a set of Python libraries covering a data model, job control, scheduling and workflow definition, which will merge the RELION pipeliner and the existing CCP-EM project manager. We propose to continue this development, integrate it into the production version of RELION, and expand its use to the rest of CCP-EM, facilitating closer integration between reconstruction, model building and data validation.

The extended framework will be used to integrate subtomogram averaging functionality, and develop the link to tomographic reconstruction. We will also develop on-the-fly processing for eBIC (and other facilities), tools for validation and simplified EMDB deposition, automation of data processing workflows and a new combined GUI for CCP-EM and RELION, as well as allowing third-party plugins to be easily incorporated into the pipeline. Machine-learning methods for data-driven analysis will be embedded at all levels.

We will continue to organise the annual Spring Symposium, expand the training programme, contribute to international meetings, and organise hackathons for developers.

Planned Impact

There is widespread interest in the international research community in using cryoEM to tackle big scientific challenges, such as understanding molecular machines in action and membrane complexes in situ. The technological advances enabling this were recognised in the award of the 2017 Nobel Prize in Chemistry. The field has strong links to atomic structure methods, especially crystallography, while electron tomography has strong connections to cell biology. The spatial dimension is needed to understand biological networks and machines, and complements traditional systems biology approaches. By supporting cryoEM research groups in the UK and encouraging a collaborative effort, CCP-EM advances the usage of cryoEM, and has an impact on many structural biology projects.

CCP-EM has an impact on individual researchers through its training and knowledge exchange aspects. The expansion of cryoEM as an important component in the toolkit for understanding cellular and sub-cellular biology relies on the availability of researchers competent in the computational techniques required to interpret the data. Existing researchers will of course benefit through improved software tools and environment. However, we specifically wish to lower the barriers to entry into the field of cryoEM. This applies not only to students and young postdocs, but also to researchers moving from other fields or wishing to use cryoEM as an additional technique.

The problems being addressed by cryoEM are of major importance in biomedical science, e.g. dynamic systems involving protein folding/refolding/misfolding, important in neurodegeneration and other misfolding diseases, virus-host interactions, and drug binding studies. Although our primary focus in the Partnership is on the fundamental science, advances here ultimately have an impact on translational and medical research. We are focussing on a technique rather than a particular scientific area, so the impact is likely to be very broad, leading eventually to improved medicine and health for the nation.

The insight that cryoEM provides to disease mechanisms is attracting the attention of a number of pharmaceutical companies. Around a dozen companies have purchased licences to use the CCP-EM software suite, and this number is growing. Company R&D staff also regularly attend our events. As well as helping to elucidate the underlying mechanisms of disease, cryoEM allows scientists to visualise the effect of drug molecules on proteins and complexes, in a near-native environment. As the technique matures, it is likely to become part of the drug development pipeline. Other industries (biotechnology, agribusiness) will also potentially benefit from the extra insight into biological processes provided by cryoEM.

By uniting the UK cryoEM community, the Partnership will have an impact on the strategy for future developments. By acting together, the UK cryoEM community will have a stronger voice. This will be used to influence software developers, instrument manufacturers and data standards development. They will also be able to provide input for national and international policy makers and funders. Finally, CCP-EM will take part in public engagement events to help strengthen the public understanding and appreciation of biomedical research.

Publications

10 25 50
publication icon
Joseph A (2022) Atomic model validation using the CCP-EM software suite in Acta Crystallographica Section D Structural Biology

publication icon
Palmer CM (2022) Real space in cryo-EM: the future is local. in Acta crystallographica. Section D, Structural biology

publication icon
Ploscariu N (2021) Improving sampling of crystallographic disorder in ensemble refinement. in Acta crystallographica. Section D, Structural biology

publication icon
Simpkin AJ (2021) Redeployment of automated MrBUMP search-model identification for map fitting in cryo-EM. in Acta crystallographica. Section D, Structural biology

publication icon
Yamashita K (2021) Cryo-EM single-particle structure refinement and map calculation using Servalcat. in Acta crystallographica. Section D, Structural biology

 
Description Knowledge Assets Grant Fund: Enhanced UI for CCP-EM Software Suite
Amount £100,000 (GBP)
Organisation Department for Business, Energy & Industrial Strategy 
Sector Public
Country United Kingdom
Start 12/2021 
End 03/2022
 
Description Molecular structure from images under physical constraints
Amount £1,000,000 (GBP)
Organisation Alan Turing Institute 
Sector Academic/University
Country United Kingdom
Start 03/2021 
End 10/2023
 
Description AIMLAC CDT - Aberystwyth, Bangor, Cardiff, Swansea, Bristol 
Organisation Swansea University
Country United Kingdom 
Sector Academic/University 
PI Contribution We provide placements for students on this CDT programme. Specifically for the 2020/2021 cohort, we provided 2 placements. The students worked on projects concerning denoising of electron micrographs for cryoEM, and modelling of neutron reflectometry data. Each student has completed a 2 week initial placement, followed by the main 6 month placement. For the 2021/2022 cohort, we are providing a further 2 placements. One will continue to refine the cascade machine learning model for segmentation of molecular volumes from cryoEM. The other will work on the CoVal server for linking SARS-CoV-2 variant data with experimental structures.
Collaborator Contribution The CDT administers the programme, and matches us up with specific students. The students themselves contribute to our on-going research programme. Typically, they deliver a small piece of coding which can be included in our larger software packages.
Impact No outputs yet. The collaboration is multi-disciplinary in the sense that the students come from a background of AI in physical sciences, and contribute to projects in the biosciences when with us.
Start Year 2020
 
Description Coronavirus Structural Task Force 
Organisation University of Hamburg
Country Germany 
Sector Academic/University 
PI Contribution The Coronavirus Structural Task Force aims to provide structural information on proteins from the SARS-Cov-2 virus. As well as structures from the internatinoal repositories PDB and EMDB, the Task Force provides quality assessment, and in some cases improved structural modelling. Joseph Agnel from the CCP-EM team has provided many of the validation tools used by the Task Force, and has improved some of the viral structures and deposited them on their web site.
Collaborator Contribution Our partners provide the dissemination site for our efforts to improve the structural modelling for SARS-Cov-2.
Impact The collaboration is between structural biologists from different international groups.
Start Year 2020
 
Description Global Phasing Ltd collaboration 
Organisation Global Phasing
Country United Kingdom 
Sector Private 
PI Contribution We bring prior work on development of a software framework for single particle analysis by Relion, which the partnership is building on. We also bring our established relationship with the Relion development team, with the eBIC facility, and with the wider community.
Collaborator Contribution Global Phasing Ltd bring expertise on developing automated data processing pipelines in crystallography, as well as specific software such as Buster. Their contribution is in terms of background IP and current staff effort.
Impact The output will be improved software for processing single particle cryoEM datasets, which will be deployed at the eBIC facility and distributed as part of the CCP-EM software suite. The project is supported by internal funds of Global Phasing and CCP-EM, as well as a Proof of Concept award from STFC. Prototype software has been developed, and a first version will be released soon. The work combines structural biology, data management and software engineering.
Start Year 2019
 
Description Leeds EM facility 
Organisation University of Leeds
Department Astbury Biostructure Laboratory
Country United Kingdom 
Sector Academic/University 
PI Contribution Help to scope computational requirements of the new Electron Microscopy facility. Later, we will help with installation, and writing of custom software pipelines. We have contributed to training events held at Leeds in Sept 2016, July 2017 and Dec 2018.
Collaborator Contribution Advice on software required by Electron Microscopy facilities. This advice was included in a grant application submitted in February 2016, as well as informing on-going plans. Neil Ranson was included as a CoI in the 2020 proposal for the renewal of CCP-EM.
Impact Material for inclusion in the CCP-EM renewal grant application, concerning the software infrastructure needs of cryoEM facilities. Election of Neil Ranson (Leeds) as Deputy Chair of CCP-EM. Leeds hosted a Relion workshop in September 2016, an MD/EM workshop in July 2017 and a cryoEM workshop in Dec 2018.
Start Year 2015
 
Description eBIC collaboration 
Organisation Diamond Light Source
Country United Kingdom 
Sector Private 
PI Contribution eBIC (electron Bio-Imaging Centre) at Diamond Light Source provides scientists with state-of-the-art experimental equipment and expertise in the field of cryo-electron microscopy, for single particle analysis and cryo-tomography. As part of CCP-EM, my group are providing computational support to users of eBIC, in the form of installing software and direct assistance. We benchmark certain key codes, and have optimised their usage on Diamond compute clusters. We plan to co-develop software pipelines to enable users to get rapid feedback on their data collection.
Collaborator Contribution eBIC provides a pool of cryoEM users who can provide feedback on the developing CCP-EM software suite, and provide test datasets. Diamond staff will also help to co-develop software pipelines for rapid feedback.
Impact The collaboration involves hardware development (microscope and computational resources), application software development, and structural biology. We maintain a set of EM software on Diamond systems, available to users of eBIC. We have benchmarked version 2.0 of the Relion software on a GPU platform at Diamond. We have helped with the installation of cryoSPARC at Diamond. We have implemented the Relion-IT pipeline for automatic processing of micrographs during a user visit, and are now trialling the CCP-EM pipeliner.
Start Year 2015
 
Title Buccaneer in CCP-EM 
Description The Buccaneer pipeline is available within the CCP-EM graphical user interface. Given a cryoEM map, obtained for example from single particle reconstruction, and the sequence of the expected protein molecules, the pipeline will build and refine an atomic model. This is a crucial step in the interpretation of experimental data from cryoEM. In comparison to other model building tools, Buccaneer can handle relatively low resolution. Buccaneer is an important part of the annual CCP-EM Icknield training school on model building and refinement. It was also used in our team's submission to the 2019 Model Metrics Challenge (organised by the global EM Data Resource https://www.emdataresource.org/). 
Type Of Technology Software 
Year Produced 2017 
Impact The pipeline has been used in CCP-EM workshops, and has helped several researchers with their structural biology projects. The Buccaneer pipeline remains an important component of the CCP-EM software suite, and is updated periodically by the main author. 
 
Title CCP-EM pipeliner 
Description CCP-EM pipeliner is a software library / framework underpinning workflows in single particle cryoEM. It is based around defined node types which can be used to link together jobs, and to trace provenance of data. It tracks metadata, and collates metadata to accompany deposition of structural data to the PDB / EMDB. It was initially developed to support workflows in the Relion software, and will eventually replace the scheduler framework used currently. It is now being expanded to other CCP-EM software, and being used to develop deposition tools in collaboration with PDBe/EMDB staff at EMBL-EBI. A first hackathon was held in March 2022 to encourage software developers in cryoET to adopt it. We expect to release it later in 2022. 
Type Of Technology Software 
Year Produced 2022 
Impact The hackathon in March 2022 not only encouraged the use of CCP-EM pipeliner, but was also a catalyst to bring together multiple software development groups in cryoET. This led to some coordination of efforts, and should lead to better software provision for end users. 
 
Title CCP-EM version 1 
Description The CCP-EM software suite provides a collection of programs for cryoEM single particle reconstruction and building of atomic models. The suite as a whole has an STFC licence, and is licensed free of charge to non-profit users, and for a charge to for-profit users. Nevertheless, many of the component programs are available separately under Open Source licences. Version 1 was released April 2018, with updates 1.1 in July 2018, 1.2 in December 2018, 1.3 in April 2019, 1.4 in November 2019 and 1.5 in October 2020. This first official release of the CCP-EM software suite mainly covered fitting and refinement of atomic models into single particle reconstructions, combining experience gained in CCP4 with high resolution maps with other techniques more appropriate to lower resolution maps. Since that initial release, the suite has expanded to include tools for map analysis and manipulation. Version 1.2 included for the first time pre-compiled binaries for Relion, the most popular software for single particle reconstruction. Besides providing a convenient way of viewing Relion projects on a personal machine, the inclusion of Relion is the basis of on-going efforts to integrate reconstruction with downstream map interpretation. The CCP-EM suite also includes software libraries such as mrcfile, clipper-python and relion-it, which are being used by third-party developers and facility sites for customised workflows. Version 1.6 is in preparation. Meanwhile, stable nightly builds are made available (latest 25/01/22) which contain some useful updates. 
Type Of Technology Software 
Year Produced 2018 
Impact The suite is used by many academic and industrial cryoEM groups worldwide to solve novel macromolecular structures. These are deposited in the Electron Microscopy Data Bank (EMDB) and the Protein Data Bank (PDB), from where they can be employed in wider biomedical applications. We estimated that v1.3 was downloaded around 1500 times, with later versions downloaded at least that much (it is difficult to estimate unique downloads, and has not been done recently). The two papers describing the suite itself have been cited 216 times to date (as at March 2022), with individual programs from the suite cited many more times. 
URL http://www.ccpem.ac.uk/download.php
 
Title Machine learning toolbox for 3D macromolecular data 
Description We have developed a collaborative software toolbox that includes a number of methods and pre-processing steps common specifically to applying machine learning in 3D macromolecular data. The aim is to improve the accessibility of machine learning techniques to the members of the community and lower the technical entry barrier to applying them. The toolbox is being used in several internal machine learning projects. For example, it provides the core data manipulation for our development of a cascaded neural network for identifying low resolution features in cryoEM maps. It has also been released to collaborators in the CCP-EM consortium, for example at Delft, NL. These application projects are in turn driving the further development of the toolbox. The toolbox is expected to be production ready in future releases of the CCP-EM software suite. 
Type Of Technology Software 
Year Produced 2020 
Impact The toolbox has been used in at least 5 different software development projects, and has an impact indirectly through these. 
 
Description CCP-EM Spring Symposia (virtual) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The 2020 and 2021 CCP-EM Spring Symposia were held online in a virtual format due to the COVID-19 situation. The conference was hosted on Zoom and was free of charge, but with registration required.

The conference aims to provide a forum to highlight state of the art developments in computational cryoEM and related themes as well as showcasing outstanding recent applications. We aim to promote an inclusive, friendly atmosphere welcoming both old and new to the community. Also included is the Diamond Light Source Biological Cryo-imaging User Meeting (eBIC & B24). Topics include instrument technology, sample preparation, image processing, single particle reconstruction, tomography and model building.

The Scientific organisers for 2020 were Helen Saibil (Birkbeck) and Christos Savva (University of Leicester), and for 2021 Giulia Zanetti (Birkbeck) and Christopher Aylett (Imperial). All other organisation was by STFC.

Because of the online format, we reached a much larger audience than normal. In 2020, we had about 3000 registered, with around 1000 logged in to the sessions at any one time. The numbers were slightly less in 2021, but still much larger than the original in-person meetings. We believe that by the second year of the pandemic, there was some fatigue with the number of on-line conferences. Based on these experiences, we plan to make future Spring Symposia hybrid events, combining the advantages of in-person meetings with the reach and accessibility of on-line meetings.

Speakers slides are made available on our website, and recordings of talks are on YouTube (on the STFC channel, also linked from our website). These are recognised as important resources for our community.
Year(s) Of Engagement Activity 2020,2021
URL https://www.ccpem.ac.uk/symposium.php
 
Description CCP-EM website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The CCP-EM website is used for dissemination of the aims of CCP-EM, advertising our meetings and workshops, as well as 3rd party meetings of interest to our community.

The CCP-EM software suite is available via the website as downloadable packages. There is a dedicated tutorials page https://www.ccpem.ac.uk/tutorials.php for users to learn how to use the software.

The website also hosts specialist information, such as a description of the MRC file format.
Year(s) Of Engagement Activity 2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
URL http://www.ccpem.ac.uk
 
Description CCPEM contribution to EMBO courses on Image Processing for cryo EM 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The EMBO course on Image processing for cryo-electron microscopy is held every other year at Birkbeck, and is very popular. Over 10 days, it teaches all aspects of cryoEM including sample preparation, microscope operation, data processing and structure modelling.
The core team of CCP-EM contributed to the 2017 course in several ways. We gave an invited lecture on CCP-EM, and we supported several hands-on computer tutorials. We also directly sponsored the event, allowing more students to be supported.
We contributed again to the 2019 course. This time we ran two practicals: "Fitting of structures, flexible fitting (Flex-EM), model validation (TEMPy)" and "Local sharpening (LocScale), de novo structure building (CCP-EM, REFMAC)". CoIs from the current CCP-EM grant delivered 7 of the lectures. CCP-EM again sponsored the event.
In 2021, CCP-EM collaborators gave several talks, and CCP-EM core staff ran several computer practicals. We again provided sponsorship to help support students. The event was virtual this year, and STFC provided AV support (mainly via Zoom) which we arranged.
The course trains around 50 students and postdocs in cryoEM each time, and is a major contributor to skills development for cryoEM. It is run by Birkbeck College, London, who are an important partner in CCP-EM. In terms of teaching, it is a community effort. In addition to the specific activities mentioned above, CCP-EM plays an indirect role in supporting this community and coordinating efforts.
Year(s) Of Engagement Activity 2017,2019,2021
URL https://meetings.embo.org/event/21-cryo-em
 
Description Contribution to SWSBC meeting on structural biology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The annual meeting of the South West Structural Biology Consortium is a forum for research groups in the South West of England to come together to present their research and build networks amongst each other. Founded in 2001, this annual consortium rotates around the Universities of Bath, Bristol, Cardiff, Exeter, Reading, Portsmouth, Southampton, Sussex and UCL. It is particularly aimed to PhD students, Postdocs and ECRs.

The meeting was originally for macromolecular crystallography, but there is an increasing component of single particle cryoEM. Colin Palmer from the CCP-EM core team presented a talk on "The CCP-EM software suite for cryoEM" introducing the students to what is available in CCP-EM.
Year(s) Of Engagement Activity 2021
URL https://blogs.exeter.ac.uk/swsbc2021/
 
Description IUCr Satellite workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This remote training event was held as part of a joint workshop on structural biology methods and computational analysis of crystallographic and Cryo-EM data in connection with the XXV Congress and General Assembly of IUCr in Prague, August 2021.

Our computational session on CCP-EM, which followed a similar session for CCP4, covered a general introduction to CCP-EM, single particle reconstruction, model building, refinement and validation. The course was delivered mostly through computer practicals.
Year(s) Of Engagement Activity 2021
URL https://www.ibt.cas.cz/en/core-facilities/centre-of-molecular-structure/courses/2021-iucr-cms-satell...
 
Description Training on structure validation for biomolecular simulation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact As part of the annual CCPBioSim training week, I taught a 3 hour workshop on structure validation. Many biomolecular simulations start with a structure from the Protein Data Bank, which may have been determined by crystallography or by cryoEM. The workshop was about how to validate the structures obtained and advice to help pick a good starting point for simulations. The main presentation has been made available on the CCPBioSim website. Computer practicals were also provided via Jupyter notebooks.

The workshop is part of a wider initiative to improve links between the structural biology and biomolecular simulation communities. Both communities need to understand the capabilities and limitations of the other.
Year(s) Of Engagement Activity 2021
URL https://www.ccpbiosim.ac.uk/events/workshop-course-material/eventdetail/135/-/ccpbiosim-training-wee...