Automated protein model building with flexible models and high resolution data

Lead Research Organisation: University of York
Department Name: Chemistry

Abstract

Building an atomic model into an electron density map is a key stage in the solution of 3D structures by X-ray or EM methods. Once time consuming, this task is now automated by packages such as Dr Cowtan's 'Buccaneer' software (700 citations). Buccaneer shows class leading performance when data resolution is poor, however it performs less well than competing software when resolution is good and the resulting models require more manual tweaking. Both the strengths and weaknesses arise from an over-rigid model of the protein.

The aim is to introduce (a) a new feature recognition method for use with high resolution electron density, and (b) a more flexible model of both the protein backbone and of sidechain conformations. Both will contribute to improved performance when data resolution is good, and the latter will reduce the amount of manual tuning required to the model at intermediate resolutions.

The Buccaneer software is unique in that it is driven by the shape of the electron density rather than the peak values. This approach was key to lower resolution performance, but has yet to be exploited for high resolution data. Furthermore at higher resolution there is the possibility of optimizing the method for interactive graphical use.

X-ray crystallographic structure solution is increasingly being conducted by non-specialists, who often rely on software to produce an accurate structure with limited manual validation. As a result it is increasingly important that the software produce the most complete and accurate model possible without relying on manual tuning.

A new feature recognition function will be developed by data-mining electron density from known high resolution structures, with an optimized version of an existing search algorithm. Existing test structures will be used to identify model limitations for the implementation of improvements to model parameterization. The new features will be implement into two existing computer software packages, 'buccaneer' and 'coot'. The work involves programming and data analysis, both of which are highly transferable skills. Publications arising from the work are likely to be highly cited by software users, which will provide a foundation for a future research career.

Publications

10 25 50
publication icon
Alharbi E (2019) Comparison of automated crystallographic model-building pipelines. in Acta crystallographica. Section D, Structural biology

publication icon
Alharbi E (2021) Predicting the performance of automated crystallographic model-building pipelines. in Acta crystallographica. Section D, Structural biology

publication icon
Bond P (2020) Predicting protein model correctness in Coot using machine learning in Acta Crystallographica Section D Structural Biology

publication icon
Cowtan K (2020) Shift-field refinement of macromolecular atomic models. in Acta crystallographica. Section D, Structural biology

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M011151/1 30/09/2015 29/09/2023
1792631 Studentship BB/M011151/1 30/09/2016 30/11/2020 Paul Simeon Bond
 
Description X-ray crystallography is the most common method used to determine the three-dimensional structure of biological macromolecules, with proteins being of particular interest. Both the amplitudes and phases of the diffracted X-ray waves are needed to construct an electron density map, which is then interpreted by building an atomic model. However, the phases cannot be directly measured and must be estimated using either experimental phasing or molecular replacement.

Buccaneer is a program for automatically building protein models into electron density maps, which is iterated in a pipeline with global refinement to refine the model and update the map. The amount of time-consuming manual completion required depends on the success of automated building, which may fail in difficult cases with low resolution data or poorly estimated phases. The aim of this work was to improve automated building and therefore make structure solution quicker and easier.

Potential developments to Buccaneer were explored, but it was changes to the pipeline that proved to be most effective. The pipeline control system was updated and the following steps were added: shift-field refinement, classical density modification, addition of water and dummy atoms, pruning, and final rebuilding of side chains. The new pruning steps delete chains, residues and side chains using two neural networks, which were trained to predict main-chain and side-chain correctness by combining many validation metrics. The set of 54 experimental phasing cases previously used for testing Buccaneer was expanded to 202 experimental phasing and 1351 molecular replacement cases. The combined pipeline changes substantially improved performance, increasing the mean completeness of the experimental phasing cases from 85% to 91% and the molecular replacement cases from 40% to 74%. The updated pipeline was released as a new program called ModelCraft.
Exploitation Route The improved model building software produced through this research is useful to structural biologists for solving novel protein structures, for example to develop pharmaceuticals or industrial enzymes.
Sectors Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description The major development of this PhD has been to improve the BUCCANEER pipeline so that it is much more likely to build a good protein model with few changes needed, especially when starting from a molecular replacement model. The process of improving a model through trials of pruning, density modification and model building is now more automated, and therefore requires less time and expertise. Although the number of new structures deposited in the PDB each year is increasing, the number solved by experimental phasing has stayed roughly the same and the increase is mainly due to molecular replacement structures. This may suggest that the demand for automated model building after molecular replacement will also increase. However, part of the rise in molecular replacement structures is likely due to better automation when collecting data for the same protein with different ligands, for example recent fragment screening of the SARS-CoV-2 main protease. Additionally, as molecular replacement models improve with the number of homologues available, and even with new ab-initio methods such as AlphaFold, it will become more common that only minor rebuilding and completion is required.
First Year Of Impact 2021
 
Description CCP4 Advanced integrated approaches to macromolecular structure determination
Amount £340,384 (GBP)
Funding ID BB/S005099/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 03/2019 
End 03/2024
 
Title Buccaneer 
Description Buccaneer is an automated model building program. It quickly builds protein into a map. The following new versions have been released as a result of this award: - Revision 494 (2019-04-18) - Revision 500 (2019-08-02) 
Type Of Technology Software 
Year Produced 2019 
Impact If input structure is used, new atoms will be created with the mean B-factor. New C-alphas will not be found over known-structure. Fix so that water is not kept when treating all non-protein as known structure. 
 
Title CCP4i2 Buccaneer Pipeline 
Description CCP4i2 is a graphical interface to the series of crystallography programmes that are included within the CCP4 package and provides a complete set of pipelines that can be used to determine protein crystal structures from x-ray diffraction data. The buccaneer pipeline is a model building pipeline within this suite. The following new versions of the pipeline have been released as a result of this award: - Revision 5685 on 08/05/2018 - code refactoring - Revision 5686 on 08/05/2018 - fix to MR input phases bug - Revision 5695 on 11/05/2018 - fix to coordinate input mode and known structure - Revision 5696 on 11/05/2018 - changed method of determining whether the model is essentially complete - Revision 5697 on 14/05/2018 - fixed prosmart reference model and pipeline interruption - Revision 5745 on 05/09/2018 - reverted options to be similar to CCP4i for better performance - Revision 5807 on 27/11/2018 - added new report plots - Revision 5826 on 11/12/2018 - increased cycles to 25 and added convergence criteria - Revision 5856 on 16/01/2019 - output model taken from cycle with lowest R-free 
Type Of Technology Software 
Year Produced 2019 
Impact The new releases of the CCP4i2 buccaneer pipeline mean that models built by this pipeline are more complete and have lower R-factors than previously. Notably, models that require more pipeline iterations will now run for longer. 
 
Title ModelCraft 
Description ModelCraft is a new model building pipeline that advances on the Buccaneer pipeline in CCP4i2 by including density modification (both classical and through adding and refining dummy atoms), default use of pruning steps with changing behaviour at different resolutions, and inclusion of Nautilus for building nucleic acids. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Talks on ModelCraft were given at the CCP4-DLS Workshop in 2021 and the CCP4 Study Weekend in 2022. It is currently available for people to install though PyPI and will be distributed with CCP4 from version 8.0. 
URL https://paulsbond.co.uk/modelcraft
 
Description CCP4 School, Chandigarh, India, 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact CCP4 Crystallography School and Workshop at CSIR-IMTech, Chandigarh, India. 35 students attended from institutions across India. The students were mostly postgraduate students but there were also some post-doctoral researchers. I gave a lecture on automated model building using buccaneer and nautilus, and helped students during workshops.
Year(s) Of Engagement Activity 2018
URL http://www.ccp4.ac.uk/schools/India-2018
 
Description CCP4 School, Sao Carlos, Brazil, 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact IFSC/CCP4 Macromolecular Crystallography School 2018 in Sao Carlos, Brazil. Attendees were postgraduate students and post-doctoral researchers from across South America. I gave two lectures, one on automated model building using buccaneer and nautilus and another on density modification. I also delivered a workshop on model building and helped out students during other workshops.
Year(s) Of Engagement Activity 2018
URL http://www.ifsc.usp.br/mx2018/
 
Description CCP4 Study Weekend 2022 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The CCP4 Study Weekend 2022 was virtual and attended by hundreds of people. A talk was given titled "ModelCraft: an advanced automated model-building pipeline using Buccaneer" to this audience as well as a Lunchtime Byte session to around a hundred people.
Year(s) Of Engagement Activity 2022
URL https://sw2022.co.uk/
 
Description CCP4/APS School in Macromolecular Crystallography 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The course is intended mainly for graduate students, postdoctoral researchers and young scientists, along with commercial/industrial researchers in the area of structural biology from all across the globe. The purpose of the school is to address specific problems that the applicants face while collecting diffraction data and while solving and refining novel structures. 20 people attended the course from a range of institutions, mainly across the USA. Two lectures were given: one on density modification and one on automated model building. Help was given to students working on their own data. Feedback was received from one student thanking me for my investment in the mentoring process.
Year(s) Of Engagement Activity 2019
URL https://www.ccp4.ac.uk/schools/APS-2019/
 
Description DLS-CCP4 Data Collection and Structure Solution Workshop 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The workshop will consist of presentations and tutorials delivered by experts in the field, plus one day of data collection time at Diamond's excellent MX beamlines. Hosted on Zoom, students will be able to work alongside experts on their own projects, tackling all aspects of structure solution, from data collection through to phasing, refinement and validation. A talk titled "Model Building: Buccaneer & ModelCraft" was presented and help was given to students working on their own projects.
Year(s) Of Engagement Activity 2021
URL https://www.ccp4.ac.uk/schools/DLS-2021/
 
Description St Andrews Protein Crystallography Summer School 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The Protein Crystallography Summer School in 2019 aims to cover the theoretical and practical aspects of protein crystallography. The School is intended for postgraduates or postdocs new to crystallography. Priority is given to UK applicants because of the funding arrangements, but is open to a limited number of overseas applicants. Applicants from industry are welcome. 50 people attended the course. An extended workshop on how to use Coot was delivered and lots of positive feedback was received. Help was also given during all of the practical sessions.
Year(s) Of Engagement Activity 2019
URL https://synergy.st-andrews.ac.uk/proteincrystallography/