Automated de novo building of protein models into electron microscopy maps

Lead Research Organisation: University of York
Department Name: Chemistry

Abstract

Scientists are interested in the atomic structure of biological molecules, in other words what the molecules look like. Knowing in detail what a molecule looks like provides important clues to how it might work. If we can go further and capture molecules in the process of interacting with other biological molecules, or artificial compounds such as drugs, we get a clearer picture of how they work.

Most of our knowledge of the structure of biological molecules comes from X-ray crystallography. However over the past decade a new technique, electron microscopy (EM) has become popular. Individual molecules held in a thin film of liquid solvent are frozen and placed in an electron microscope, which captures images of the molecules. Many individual views can be combined to construct a model of the structure of the molecule in 3 dimensions.

Until recently these images were of limited resolution - they were 'fuzzy' - and so individual groups of atoms could not be seen. The EM user therefore needed to have some knowledge of the structure of the molecule, or at least parts of it, in advance. These fragments can then be fitted into the EM image to give an indication of the whole structure, and allowed large molecular machines such as the Ribosome to be understood.

New electron detectors have allowed EM images to be determined at much higher resolutions, so that small groups of atoms can be distinguished. The resulting images are of similar quality to those from X-ray crystallography. This has allowed the atomic structure of the molecule to be determined without any prior knowledge of the structure in favourable cases. However at the moment the process of interpreting the map in terms of atomic features is often performed manually, at a cost of considerable effort and a potential lack of objectivity in the results.

The aim of this project is to take an existing method for automatically building atomic models into images from X-ray crystallography, and modify the software to work effectively with the images from electron microscopy. Not only will this make the process of building an atomic model into an electron microscopy image much less time consuming, it will allow multiple models to be built into different images of the molecule as an assessment of the accuracy and reliability of the results. It will be possible to go back and check existing structures by rebuilding the maps automatically. This will provide a useful check on the quality of existing models determined from EM images.

The project involves modifying existing computer software for building atomic models to adapt it to work on a new type of image. The software is already good at interpreting crystallographic images at the kind of resolutions produced by electron microscopy experiments, but works less well with EM images because it has been "trained" to work with crystallography images. Some retraining, and possibly some new methods, will be required.

All of the software produced by the project will be distributed freely to academic users through existing software suites for crystallography and electron microscopy. The source code for software will also be distributed so that other developers can learn from it or modify it.

Technical Summary

The aim of the project is to adapt and optimize X-ray crystallographic model building methods for effective application to the de novo building of high resolution cryo-electron microscopy maps, and to distribute the software to broad user community. We will achieve these results by following the same approach to software development which we have successfully applied to previous projects:

- Firstly, a curated library of test data will be prepared for which the final structures are known. The use of test data in this way has proven critical in our previous work, as it allows every change to the software to be evaluated across a representative selection of datasets, avoiding the problem of software which only works on a single structure. It is envisaged that a full set of tests will be run on a daily to weekly time scale, depending on the time requirements. If necessary more frequent tests will be run on a subset of the data.

- The software will be developed in the CCP4 source code repository, providing a full history of changes to the software. The current version of the software will be recorded against each set of test results to document the effectiveness of each change to the code. The source code repository is publicly visible.

- The methods used will be those developed as part of the BUCCANEER, NAUTILUS and COOT software packages, modified to provide best results against the test data.

- Release versions of the software will be incorporated into the CCP4 and CCP-EM build frameworks to allow the software to be built for Windows, Mac and Linux computer systems.

- The software will be incorporated into automate software pipelines and presented to the user through the standard CCP4 and CCP-EM user interfaces.

- The software will be incorporated into the standard CCP4 and CCP-EM installation and update tools to allow installation by normal users without the assistance of a system manager.

Planned Impact

Cryo-electron microscopy has progressed over the last decade from being a niche technique for the low resolution imaging of large complexes, to a comparatively routine technique suitable for solving most medium to large structures. The resolution of the best EM reconstructions are now sufficient for de novo building. The UK now boasts a number of large EM facilities, for example at the MRC, Diamond and Leeds.

While the capital cost of an EM facility is large, the method offers substantial benefits for certain classes of problem. Unlike X-ray crystallography, the sample does not need to be crystallized - a time consuming and sometimes unsuccessful step. The challenge of crystallisation introduces a risk in the crystallographic pathway which carries its own cost. Consequently EM will see increasing use in the biotech and pharmaceutical industries. EM methods have the further benefit of imaging molecules in a state which is undistorted by crystal contacts, and thus in some cases more informative for biological problems.

CCP4 has been very successful in serving the biotech and pharmaceutical industries, as evidenced by over a hundred annual software licenses issued to industrial customers, raising typically £1m per annum in income. CCP-EM seeks to fill the same role for EM users. Dr Cowtan's work has contributed significantly to the success of CCP4, with his contributions to density modification, model building, visualisation and supporting infrastructure attracting over 10,000 citations in the peer-reviewed literature, as well as being cited in patents.

The development of de novo model building software specialised to the interpretation of EM maps, and their contribution to the CCP-EM software suite for EM structure solution will make the method more effective and make de novo structure solution more accessible to the typical user. The validation and automated rebuilding of existing models will reduce bias and improve the quality of the structures in the EM database.

The direct benefits to industry are expected, by parallel with X-ray crystallography, to be realised through the development of new drugs and biochemical processes, building on the insights arising from the structures determined by these methods. However, as with X-ray crystallography, we expect that linking individual products to software developments will be difficult due to the closed nature of the sector. The primary indicator of impact will remain the license fees which industrial users are willing to pay for the software.

Finally, the theoretical work which underlies this proposal will improve our understanding of the features of EM electron density reconstructions, and provide a basis for other developers to address the same problems in different ways. We therefore expect that our work will provide a catalyst for an expansion of the development of software for the later stages of EM structure solution, in particular model building and refinement. UK leadership in this area will provide a competitive advantage to our users and partners in UK industry.

Publications

10 25 50
publication icon
Hoh SW (2020) Current approaches for automated model building into cryo-EM maps using Buccaneer with CCP-EM. in Acta crystallographica. Section D, Structural biology

publication icon
McNicholas S (2018) Automating tasks in protein structure determination with the clipper python module. in Protein science : a publication of the Protein Society

 
Description An archive of high resolution electron microscopy (EM) datasets has been gathered and curated from EMDB. These are updated annually with new structures.

The BUCCANEER model building software has been combined with tools from elsewhere to enable it to be used on EM reconstructions. The software was then tested against the archive of test data, and several options were identified which needed to be changed for EM data. It was establishing that the software is useful for reconstructions at better than 4 Angstroms resolution, with marginal results at slightly worse resolutions. We have also developed further customizations (in particular the use of EM reference data for training the method) which further improve the performance with EM data, although these are still in final testing before release.

Both the BUCCANEER and NAUTILUS software (for protein and nucleotide model building respectively) have been incorporated into the CCP-EM software suite. Software pipelines, user interfaces and reports have been implemented in the CCP-EM software framework to allow the software to inter-operate seamlessly with other EM software. Three beta releases of the CCP-EM suite have been made including the software. The software suites have been demonstrated at user workshops, with useful results on novel structures.

Since Feb 2018, we have made further developments to the CCP-EM pipelines. New reference structures using EM rather than crystallographic data have been tested and released. The use of EM reference structures to derive prior knowledge of what map features are expected to look like in a map has significantly improved the identification of side chain types in protein maps. More minor tweaks to the handling of the EM data have produced further improvements to both speed and effectiveness. The software has been extended to handle much larger structures, such as the large protein or protein/nucleotide complexes which are commonly investigates using cryo-EM. We investigation of automated map scaling in collaboration with the LocSCALE group at EMBL.

A software tool to allow automated building of protein-nucleotide complexes is in development in response to user requests at workshops. This goes beyond the original aims of the grant, is partially complete and is being tested and finalized as time permits.

We took part in the 2019 EMDB model metrics challenge. A paper on the software, and another on the model metrics challenge, are in preparation.
Exploitation Route Our software incorporated into CCP-EM is already being used by cryo-electron microscopists to perform de-novo model building where the resolution is sufficient. This provides a powerful validation of the EM reconstruction. We are looking at the possibility of using automated building to evaluate the effectiveness of local sharpening in collaboration with the LocScale developers.
Sectors Pharmaceuticals and Medical Biotechnology

URL http://www.ccpem.ac.uk/training/icknield_2017/icknield_2017.php
 
Description The software produced by this project have been contributed to the CCP-EM software suite and provides the primary de-novo model build toolset within that suite. The CCP-EM suite is now seeing significant growth in user base and is beginning to attract commercial licensees, reflecting the utility of the software within the pharmaceutical and biotechnology sectors.
First Year Of Impact 2019
Sector Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title CCP-EM beta release 
Description The Collaborative Computational Project for electron cryo-microscopy (CCP-EM) supports users and developers in biological EM by support the users of software for cryo-EM through dissemination of information on available software, and directed training. The current beta release of CCP-EM includes versions of both the BUCCANEER protein model building software, and the NAUTILUS nucleic acid model building software. These applications were originally developed for X-ray crystallographic data, however they have been integrated into pipelines which can convert EM data for interpretation. New user interfaces and reports have been implemented to make the software easily usable on EM data within the CCP-EM graphical user interface. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The software is still in beta release, so usage and benefits are not yet clear. However we have received positive reports from users at EM workshops, and so expect some citations in structural papers. 
URL http://www.ccpem.ac.uk/index.php
 
Title CCP-EM release version 1.1 
Description The Collaborative Computational Project for electron cryo-microscopy (CCP-EM) supports users and developers in biological EM by support the users of software for cryo-EM through dissemination of information on available software, and directed training. The current beta release of CCP-EM includes versions of both the BUCCANEER protein model building software, and the NAUTILUS nucleic acid model building software. These applications were originally developed for X-ray crystallographic data, however they have been integrated into pipelines which can convert EM data for interpretation. New user interfaces and reports have been implemented to make the software easily usable on EM data within the CCP-EM graphical user interface. Buccaneer - changed to use EM reference map and structure Nautilus - no changes 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact We have received positive reports from users at EM workshops, and the CCP-EM papers have now received 30 citations. 
URL http://www.ccpem.ac.uk/
 
Title CCP-EM release version 1.1 
Description The Collaborative Computational Project for electron cryo-microscopy (CCP-EM) supports users and developers in biological EM by support the users of software for cryo-EM through dissemination of information on available software, and directed training. The current beta release of CCP-EM includes versions of both the BUCCANEER protein model building software, and the NAUTILUS nucleic acid model building software. These applications were originally developed for X-ray crystallographic data, however they have been integrated into pipelines which can convert EM data for interpretation. New user interfaces and reports have been implemented to make the software easily usable on EM data within the CCP-EM graphical user interface. Buccaneer - figure of merit value changed from 0.5 to 0.99 when setting figure of merit column in mtz by default Nautilus uses the same mtz preparation tasks as buccaneer so changes to FOM value is also applied. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact We have received positive reports from users at EM workshops, and the CCP-EM papers have now received 30 citations. 
URL http://www.ccpem.ac.uk/
 
Description CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps' 5th - 6th April 2017 This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials. The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved.
Year(s) Of Engagement Activity 2017
URL https://www.iucr.org/calendar/events/types/workshops/ccp-em-icknield-workshop-for-high-resolution-mo...
 
Description CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps' Apr 2010 This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials. The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved.
Year(s) Of Engagement Activity 2019,2020
URL https://www.ccpem.ac.uk/training/icknield_2019/icknield_2019.php
 
Description CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps'
2nd - 4th March 2016

This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials.

The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved.
Year(s) Of Engagement Activity 2016
URL http://www.ccpem.ac.uk/training/icknield_2016/icknield_schedule.pdf
 
Description CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps' 1nd - 4th May 2018 This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials. The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved.
Year(s) Of Engagement Activity 2018
URL https://www.iucr.org/calendar/events/topics/cryoEM/ccp-em-icknield-workshop-for-high-resolution-mode...
 
Description Invited talk at Barcelona meeting on "MX and cryo-EM" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk by Jon Agirre and Scott Hoh on "Automated model building in cryoEM maps"
Year(s) Of Engagement Activity 2017
URL https://sbu.csic.es/conference-mx-cryoem-bcn/
 
Description Presentation at CCP-EM Spring Symposium 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited talk on Extending model building and refinement tools for Cryo-EM applications at the CCP4 symposium, Nottingham, Apr 2019
Year(s) Of Engagement Activity 2019
URL https://www.youtube.com/watch?v=evbJV6431EA