Automated de novo building of protein models into electron microscopy maps

Lead Research Organisation: Science and Technology Facilities Council
Department Name: Scientific Computing Department

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

The aim of the project is to adapt and optimize X-ray crystallographic model building methods for effective application to the de novo building of high resolution cryo-electron microscopy maps, and to distribute the software to broad user community. We will achieve these results by following the same approach to software development which we have successfully applied to previous projects:

- Firstly, a curated library of test data will be prepared for which the final structures are known. The use of test data in this way has proven critical in our previous work, as it allows every change to the software to be evaluated across a representative selection of datasets, avoiding the problem of software which only works on a single structure. It is envisaged that a full set of tests will be run on a daily to weekly time scale, depending on the time requirements. If necessary more frequent tests will be run on a subset of the data.

- The software will be developed in the CCP4 source code repository, providing a full history of changes to the software. The current version of the software will be recorded against each set of test results to document the effectiveness of each change to the code. The source code repository is publicly visible.

- The methods used will be those developed as part of the BUCCANEER, NAUTILUS and COOT software packages, modified to provide best results against the test data.

- Release versions of the software will be incorporated into the CCP4 and CCP-EM build frameworks to allow the software to be built for Windows, Mac and Linux computer systems.

- The software will be incorporated into automate software pipelines and presented to the user through the standard CCP4 and CCP-EM user interfaces.

- The software will be incorporated into the standard CCP4 and CCP-EM installation and update tools to allow installation by normal users without the assistance of a system manager.

Planned Impact

Cryo-electron microscopy has progressed over the last decade from being a niche technique for the low resolution imaging of large complexes, to a comparatively routine technique suitable for solving most medium to large structures. The resolution of the best EM reconstructions are now sufficient for de novo building. The UK now boasts a number of large EM facilities, for example at the MRC, Diamond and Leeds.

While the capital cost of an EM facility is large, the method offers substantial benefits for certain classes of problem. Unlike X-ray crystallography, the sample does not need to be crystallized - a time consuming and sometimes unsuccessful step. The challenge of crystallisation introduces a risk in the crystallographic pathway which carries its own cost. Consequently EM will see increasing use in the biotech and pharmaceutical industries. EM methods have the further benefit of imaging molecules in a state which is undistorted by crystal contacts, and thus in some cases more informative for biological problems.

CCP4 has been very successful in serving the biotech and pharmaceutical industries, as evidenced by over a hundred annual software licenses issued to industrial customers, raising typically £1m per annum in income. CCP-EM seeks to fill the same role for EM users. Dr Cowtan's work has contributed significantly to the success of CCP4, with his contributions to density modification, model building, visualisation and supporting infrastructure attracting over 10,000 citations in the peer-reviewed literature, as well as being cited in patents.

The development of de novo model building software specialised to the interpretation of EM maps, and their contribution to the CCP-EM software suite for EM structure solution will make the method more effective and make de novo structure solution more accessible to the typical user. The validation and automated rebuilding of existing models will reduce bias and improve the quality of the structures in the EM database.

The direct benefits to industry are expected, by parallel with X-ray crystallography, to be realised through the development of new drugs and biochemical processes, building on the insights arising from the structures determined by these methods. However, as with X-ray crystallography, we expect that linking individual products to software developments will be difficult due to the closed nature of the sector. The primary indicator of impact will remain the license fees which industrial users are willing to pay for the software.

Finally, the theoretical work which underlies this proposal will improve our understanding of the features of EM electron density reconstructions, and provide a basis for other developers to address the same problems in different ways. We therefore expect that our work will provide a catalyst for an expansion of the development of software for the later stages of EM structure solution, in particular model building and refinement. UK leadership in this area will provide a competitive advantage to our users and partners in UK industry.

Publications

10 25 50
publication icon
Burnley T (2017) Recent developments in the CCP-EM software suite. in Acta crystallographica. Section D, Structural biology

publication icon
McNicholas S (2018) Automating tasks in protein structure determination with the clipper python module. in Protein science : a publication of the Protein Society

 
Description The experimental technique of electron cryo-microscopy can now image biological macromolecules at high resolution. Reconstruction from the electron microscopy images leads to a 3D volume showing the shape of the molecule. For downstream applications, such as drug design, it is useful to interpret this volume in terms of individual atoms. The software Buccaneer performs this interpretation. Previously developed for X-ray crystallography, we have adapted the software to work with cryoEM volumes.
In particular, Buccaneer has been included in the CCP-EM software suite, for use by the large community of structural biologists. The suite includes a user-friendly interface to Buccaneer, and has been publically released for the first time in 2018. There is also an interface to the related Nautilus software for building nucleic acids.
We have developed a tutorial for Buccaneer which is used in training workshops. The tutorial uses the same dataset as the popular reconstruction software Relion, allowing uses to follow the whole prrocess from micrographs to atomic model.
Buccaneer was used by our team in the 2019 Model Metrics Challenge, which assessed the ability to build atomic models into cryoEM maps and the various measures for assessing the fit.
Exploitation Route Our developments are packaged in the software Buccaneer, which is available now as part of the CCP-EM software suite. Structural biologists can use the software on their own projects, to help generate atomic models which are then deposited in the Protein Data Bank (pdbe.org). These projects cover a wide range of biomedical sectors.
Atomic models in the Protein Data Bank are in turn used by a wide range of scientists, see examples at https://www.ebi.ac.uk/pdbe/quips These applications are ultimately dependent on methodological advances, such as those in the current award.
Sectors Agriculture, Food and Drink,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://www.ccpem.ac.uk/download.php
 
Description The improved Buccaneer software and interface has been included in the CCP-EM software suite, and as such it is distributed to commercial customers including pharmaceutical companies for use in their structure determination pipelines.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description Coronavirus Structural Task Force 
Organisation University of Hamburg
Country Germany 
Sector Academic/University 
PI Contribution The Coronavirus Structural Task Force aims to provide structural information on proteins from the SARS-Cov-2 virus. As well as structures from the internatinoal repositories PDB and EMDB, the Task Force provides quality assessment, and in some cases improved structural modelling. Joseph Agnel from the CCP-EM team has provided many of the validation tools used by the Task Force, and has improved some of the viral structures and deposited them on their web site.
Collaborator Contribution Our partners provide the dissemination site for our efforts to improve the structural modelling for SARS-Cov-2.
Impact The collaboration is between structural biologists from different international groups.
Start Year 2020
 
Title Buccaneer in CCP-EM 
Description The Buccaneer pipeline is available within the CCP-EM graphical user interface. Given a cryoEM map, obtained for example from single particle reconstruction, and the sequence of the expected protein molecules, the pipeline will build and refine an atomic model. This is a crucial step in the interpretation of experimental data from cryoEM. In comparison to other model building tools, Buccaneer can handle relatively low resolution. Buccaneer is an important part of the annual CCP-EM Icknield training school on model building and refinement. It was also used in our team's submission to the 2019 Model Metrics Challenge (organised by the global EM Data Resource https://www.emdataresource.org/). 
Type Of Technology Software 
Year Produced 2017 
Impact The pipeline has been used in CCP-EM workshops, and has helped several researchers with their structural biology projects. The Buccaneer pipeline remains an important component of the CCP-EM software suite, and is updated periodically by the main author. 
 
Title CCP-EM version 1 
Description The CCP-EM software suite provides a collection of programs for cryoEM single particle reconstruction and building of atomic models. The suite as a whole has an STFC licence, and is licensed free of charge to non-profit users, and for a charge to for-profit users. Nevertheless, many of the component programs are available separately under Open Source licences. Version 1 was released April 2018, with updates 1.1 in July 2018, 1.2 in December 2018, 1.3 in April 2019, 1.4 in November 2019, 1.5 in October 2020, and 1.6 in April 2022. This first official release of the CCP-EM software suite mainly covered fitting and refinement of atomic models into single particle reconstructions, combining experience gained in CCP4 with high resolution maps with other techniques more appropriate to lower resolution maps. Since that initial release, the suite has expanded to include tools for map analysis and manipulation. Version 1.2 included for the first time pre-compiled binaries for Relion, the most popular software for single particle reconstruction. Besides providing a convenient way of viewing Relion projects on a personal machine, the inclusion of Relion is the basis of on-going efforts to integrate reconstruction with downstream map interpretation. The CCP-EM suite also includes software libraries such as mrcfile, clipper-python and relion-it, which are being used by third-party developers and facility sites for customised workflows. Version 1.6 (April 2022) included the new Servalcat wrapper for atomistic model refinement with Refmac5. There were also a number of new validation tools, including Privateer, 3D-Strudel and PI-score. Stable nightly builds are made available (latest 8/11/22) which contain some useful updates. The version 1 release sequence is now frozen, pending the release of version 2 based on Pipeliner/Doppio (see other entries). 
Type Of Technology Software 
Year Produced 2018 
Impact The suite is used by many academic and industrial cryoEM groups worldwide to solve novel macromolecular structures. These are deposited in the Electron Microscopy Data Bank (EMDB) and the Protein Data Bank (PDB), from where they can be employed in wider biomedical applications. At the moment (early 2023) we have around 30 commercial licences, indicating usage in pharma and biotech. The academic usage is estimated to be several thousand. The two papers describing the suite itself have been cited 290 times to date (as at March 2023), with individual programs from the suite cited many more times. 
URL http://www.ccpem.ac.uk/download.php
 
Description 2nd CCP4/BGU Workshop on Advanced Methods for Macromolecular Structure Determination, Ben-Gurion University 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The CCP4/BGU workshop is directed to 25 graduate students, postdocs and/or researchers with some previous expertise in crystallography and/or Cryo-EM who need a deeper insight into most advanced structural biology techniques to carry out their research projects.

The workshop program covers all aspects of structure determination, such as data collection, phasing, model building, refinement, validation and structural analysis.

Tom Burnley and Colin Palmer from the CCP-EM core team presented lectures and tutorials on single particle reconstruction with Relion and model building into cryoEM maps.
Year(s) Of Engagement Activity 2020
URL https://lifeserv.bgu.ac.il/wp/ccp4workshop/
 
Description CCP4 Study Weekends 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation workshop facilitator
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The CCP4 Study Weekend is recognised as the best conference for computational methods in macromolecular crystallography (as opposed to those focussed on the scientific results). As such it attracts the leading international developers and an audience of over 400. Each year it provides a snapshot of the state-of-the-art.
Lunchtime bytes provides an opportunity for software developers to demonstrate their programs to the delegates at the meeting. Software from both CCP4 and CCP-EM is demonstrated each year.

The proceedings of each year's conference are published in a special issue of Acta Crystallographica D. Articles in these issues are usually highly cited, as they describe methods used by many crystallographers.
Year(s) Of Engagement Activity Pre-2006,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
URL http://www.ccp4.ac.uk/ccp4course.php
 
Description CCPEM contribution to EMBO courses on Image Processing for cryo EM 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The EMBO course on Image processing for cryo-electron microscopy is held every other year at Birkbeck, and is very popular. Over 10 days, it teaches all aspects of cryoEM including sample preparation, microscope operation, data processing and structure modelling.
The core team of CCP-EM contributed to the 2017 course in several ways. We gave an invited lecture on CCP-EM, and we supported several hands-on computer tutorials. We also directly sponsored the event, allowing more students to be supported.
We contributed again to the 2019 course. This time we ran two practicals: "Fitting of structures, flexible fitting (Flex-EM), model validation (TEMPy)" and "Local sharpening (LocScale), de novo structure building (CCP-EM, REFMAC)". CoIs from the current CCP-EM grant delivered 7 of the lectures. CCP-EM again sponsored the event.
In 2021, CCP-EM collaborators gave several talks, and CCP-EM core staff ran several computer practicals. We again provided sponsorship to help support students. The event was virtual this year, and STFC provided AV support (mainly via Zoom) which we arranged.
The course trains around 50 students and postdocs in cryoEM each time, and is a major contributor to skills development for cryoEM. It is run by Birkbeck College, London, who are an important partner in CCP-EM. In terms of teaching, it is a community effort. In addition to the specific activities mentioned above, CCP-EM plays an indirect role in supporting this community and coordinating efforts.
Year(s) Of Engagement Activity 2017,2019,2021
URL https://meetings.embo.org/event/21-cryo-em
 
Description Icknield workshops on model building (annual) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The first Icknield Workshop on Model Building and Refinement for High Resolution EM Maps was held at the RAL / Diamond campus, Harwell, UK on 2nd - 4th March 2016. The first course was aimed at structural biologists with high resolution EM maps ready for / in the process of modelling, building and refinement. This three day course hosted some of the leading software developers and provided ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials.

This is now an annual event, with further instances being held in April 2017, May 2018 and April 2019. After a break due to the pandemic, the workshop returned as a four day course in Sept 2022, with the next one planned for Oct 2023. There are typically 20-22 students, carefully selected in order to give a good coverage of cryoEM labs. This is now a comprehensive course for EM model building covering advanced use of ARP/wARP, Buccaneer, CCP-EM, Coot, FlexEM, ISOLDE, LocScale, MolProbity, Refmac, Privateer and new validation tools. It covers all aspects of modelling building including: map optimisation, automated model building, medium resolution refinement, high resolution refinement, interactive refinement, validation and deposition.

Participants are encouraged to bring their own data so that the tutors can help directly. Nevertheless, we are keen to have industry representation as well, and example data is provided in cases where participants own data cannot be shared.
Year(s) Of Engagement Activity 2016,2017,2018,2019,2022
URL https://www.ccpem.ac.uk/courses.php
 
Description S²C² CCP-EM Workshop 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact CCP-EM ran a virtual workshop over Zoom, 10-13 Nov 2020, hosted by the Stanford-SLAC Cryo-EM Center. We presented software for solving macromolecular structures by cryoEM to an audience of around 200 students. This is a larger number than we usually teach, which followed from it being an online event. This allowed us to reach more people, but we were unable to give direct attention to any individual student, for instance to help solve their own structure.
We were able to showcase CCP-EM software, including tools developed under the BBSRC grant BB/P000975/1.
The event allowed us to strengthen our ties with the Standford EM Centre, and the US community in general.
Year(s) Of Engagement Activity 2020
URL https://www.ccpem.ac.uk/training/s2c2_workshop_2020/s2c2_workshop_2020.php