📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

CCP4 Advanced integrated approaches to macromolecular structure determination

Lead Research Organisation: University of York
Department Name: Chemistry

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Planned Impact

The generic importance of macromolecular crystallography in general and CCP4 in particular is provided in the Pathways to Impacts section.

CCP4 users in the pharmaceutical and biotechnology sector are most often involved in the study of protein-ligand (most often drug) complexes. The critical computational step in this process is molecular replacement (MR), in which a known atomic model from a similar structure is used to explain the diffraction pattern of the unknown structure. The MR approach is used in more than 70% of structure solutions. However it is not uncommon for the molecular replacement to yield a poor electron density map due to changes in the conformation of the protein. The software developed in this work package aims to significantly reduce the number of cases in which problems occur by increasing the range of convergence of the initial refinement of the MR model, while dramatically increasing the speed of the refinement step to allow screening many more candidate models.

Cryo-Electron Microscopy (EM) is an increasingly important method for the determination of the structure of pathogens and complexes. The same methods will also be implemented for cryo-EM data, where the resolution tolerance of the methods will facilitate the interpretation of lower resolution reconstructions.

Improvement of the protein model also improves the electron density for the unmodelled ligand or drug, since the electron density features of the known and unknown regions of the structure are related through the diffraction pattern. The speed and radius of convergence of the new method will increase the coverage of automated methods for high throughput screening, which are widely used in the commercial sector. The impact of these developments will be to reduce the number of cases where structure solutions fails, to reduce the level of manual intervention required in successful studies, and to increase the accuracy of the resulting structures.

YSBL has played a significant role in the commercial impact of CCP4: two YSBL-originated developments (the REFMAC and COOT software) have been the most-used tools in their field. Several other YSBL developments (DM, MOLREP, BUCCANEER, CCP4I) have citation counts in the hundreds to thousands are are significantly used in industry. The YSBL group engage with commercial customers through through commercial representation on the CCP4 Executive Committee and Working Groups 1 and 2, through workshops and the CCP4 bulletin board. CCP4 developers including the York group. The working groups provide guidance on which strategic planning is built.

The software produced will be added to the CCP4 suite, and where appropriate to the related CCP-EM software suite for electron microscopy. The CCP4 suite is in use world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to the overwhelming majority of macromolecular crystallographers. Libraries and methods will be available to other packages as well. CCP4 is updated with major version releases roughly every year, and automated updates on a roughly monthly basis to enable fast access to new developments. As a result, once the software has been added to the package it will within months be available to both the academic and commercial user community.

Publications

10 25 50
publication icon
Alharbi E (2023) Buccaneer model building with neural network fragment selection. in Acta crystallographica. Section D, Structural biology

publication icon
Krissinel E (2022) CCP4 Cloud for structure determination and project management in macromolecular crystallography. in Acta crystallographica. Section D, Structural biology

publication icon
Alharbi E (2019) Comparison of automated crystallographic model-building pipelines in Acta Crystallographica Section D Structural Biology

publication icon
Bond PS (2022) ModelCraft: an advanced automated model-building pipeline using Buccaneer. in Acta crystallographica. Section D, Structural biology

publication icon
Dialpuri JS (2024) Online carbohydrate 3D structure validation with the Privateer web app. in Acta crystallographica. Section F, Structural biology communications

publication icon
Alharbi E (2020) Pairwise running of automated crystallographic model-building pipelines. in Acta crystallographica. Section D, Structural biology

publication icon
Bond PS (2020) Predicting protein model correctness in Coot using machine learning. in Acta crystallographica. Section D, Structural biology

publication icon
Alharbi E (2021) Predicting the performance of automated crystallographic model-building pipelines. in Acta crystallographica. Section D, Structural biology

publication icon
Cowtan K (2020) Shift-field refinement of macromolecular atomic models. in Acta crystallographica. Section D, Structural biology

publication icon
Cowtan K (2020) Structural barriers to scientific progress. in Acta crystallographica. Section D, Structural biology

publication icon
Agirre J (2023) The CCP 4 suite: integrative software for macromolecular crystallography in Acta Crystallographica Section D Structural Biology

 
Description The project consists of four deliverables. Progress against each of these is reported below:

Deliverable 1: To generalise and optimise shift field methods for wider applicability against
multiple data sources.
? COMPLETED
? The shift field refinement software has been adapted for electron microscopy use. It has also been extended to allow refinement of maps rather than models. This novel approach allows a range of new applications not supported by existing methods, including refinement of EM maps to phase cystallographic data, and non-rigid refinement of NCS domains.

Deliverable 2: To implement new resolution independent model building framework in BUCCANEER and NAUTILUS.
? IN PROGRESS
? Paul Bond has demonstrated a new machine learning approach to feature interpretation in electron density maps for model building using GPUs. We are beginning to test this to evaluate its impact on model building.
? Paul Bond has developed a new machine learning approach to the growing of protein chains using a neural network to identify likely chain conformations. Results suggest significantly improved performance at low resolutions. The method is currently being retrained against a larger dataset. We are evaluating how to build this work into a future release.

Deliverable 3: To develop new stereochemical regularisation software library to enable critical speed improvements.
? COMPLETED
? Two new high performance regularization algorithms have been implemented. The first, described as a pseudo-regularizer, moves overlapping fragments of the input model onto the refined model to restore geometry. This is instantaneous. The second, a general purpose regularizer, is fast but not instantaneous. This will have application in future work to provide web and cloud based software for model building and refinement.

Deliverable 4: To implement contact predictions developed in WP3 within BUCCANEER and NAUTILUS
To implement contact predictions developed in WP3 within BUCCANEER and NAUTILUS. IMPLEMENTED BUT DATA ARGUED AGAINST PURSUING THIS LINE OF ENQUIRY IN OUR CURRENT SOFTWARE. COMMUNITY NEEDS IN THIS AREA LARGELY MITIGATED BY ALPHAFOLD.

The York component of this grant started part of the way through the grant due to dependencies on
other parts of the grant. The PDRA started during 12/2020. PI time for K Cowtan has been allocated
from 04/2019, and Co-investigator J Agirre has also committed work to the project, funded by his
Royal Society fellowship. Progress has been as follows:
? The shift-field refinement software has been further developed to address the full set of
applications (coordinate, isotropic B-factor and anisotropic B-factor refinement), and source
code has been released into the public CCP4 software repository. The software has been
incorporated into both molecular replacement and model building pipelines in the CCP4i2
graphical user interface and in the CCP4 Cloud graphical user interface.
? A further paper on the method has been published: Cowtan, K., Metcalfe, S., & Bond, P.
(2020). Shift-field refinement of macromolecular atomic models. Acta Crystallographica
Section D: Structural Biology, 76(12). This paper presents results of coordinate and
anisotropic B-factor refinement, and includes the complete test data and code to reproduce the
results from the paper. The software shows comparable and somewhat complementary results
to existing methods, but is 1-2 orders of magnitude faster for the preliminary stages of model
refinement.
? The shift field refinement software has been adapted for electron microscopy use. It has also
been extended to allow refinement of maps rather than models. This novel approach allows a
range of new applications not supported by existing methods, including refinement of EM
maps to phase crystallographic data, and non-rigid refinement of NCS domains.
? A new fast approach to regularization at low resolution has been implemented which allows
shift field refinement to be run for more cycles without disrupting model connectivity. This has
led to further improvement
A library for fast conventional regularization of bonds, angles, torsions and planes using
Gemmi has also been implemented that takes into account per-atom coordinate errors,
including fixed atoms that are not included in the minimisation for speed. Attempts to include
this library in shift-field refinement have not yet improved on the non-conventional
regularisation method mentioned above.
? A new automated model-building pipeline has been developed that expands on the existing
pipeline by including shift-field refinement, model pruning and density modification. This can
drastically improve completeness of the built models, especially when starting from poor
molecular replacement solutions. This has been published in: Bond, P. S., & Cowtan, K. D.
(2022). ModelCraft: an advanced automated model-building pipeline using Buccaneer. Acta
Crystallographica Section D: Structural Biology, 78(9).
? K Cowtan and a P Bond have made substantial progress on model finalization through
developments both of model building and refinement methods. This includes the development
of regularization tools which will be required for WP2. These features have been released with
the latest version of the CCP4 software suite (version 8.0).
? P Bond has been developing approaches for the application of machine learning to improve
different stages of the model building pipeline, including preprocessing the electron density
map, identification of initial structural fragments, and growing of protein chains.
? K Cowtan has implemented proof of concept code for the use of distogram data based on
contact information from alphafold. This code currently being tuned and a framework for
evaluation is being established.
? J Agirre and a student have been updating key software infrastructure components which will
be used in the project, including the Clipper-python interface library. These have been updated
to the current C++11 and Python 3 standards.
We have therefore made progress on all four goals set out in the grant and as well as in some
additional areas not originally planned.
Exploitation Route We have been exploring a potential collaboration with Maya Topf's Flex-EM group in London to extend their software using the methods developed here on the basis of the new results described above. That has lead to a responsive mode BBSRC grant to explore this application.
Sectors Manufacturing

including Industrial Biotechology

Pharmaceuticals and Medical Biotechnology

 
Description Impact ? Driving the ED&I agenda and appointing an ED&I team ? Organising online events - Study Weekend 2022 online was a very successful meeting with 1,088 participants! ? Most successful year for working group 2 meetings with higher attendance numbers (generally over 45) ? Initial planning for a new working group 2 meeting schedule with more, more inclusive, and shorter meetings. ? Produce new public-facing web pages for the project with modern presentation and improved accessibility. ? Starting the CCP4 Documentation project to make outputs more accessible. 2 publications have also been associated with the impact part of the project under Ivo Tews.
First Year Of Impact 2021
Sector Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Societal

 
Title Comparison of automated crystallographic model building pipelines 
Description Supporting test data, computer code and results for the paper "Comparison of automated crystallographic model building pipelines" 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://pure.york.ac.uk/portal/en/datasets/d4cb35df-a42d-4365-b539-9868730d165f
 
Title Pairwise running of automated crystallographic model-building pipelines 
Description Data and methods for paper "Pairwise running of automated crystallographic model-building pipelines" submitted to Acta Crystallography D 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://pure.york.ac.uk/portal/en/datasets/4b7c880a-d6b0-471a-a379-d52c4ee947fe
 
Title Sheetbend software for model morphing with non-atomic parameterizations. 
Description Software for optimizing a 3D model of a biological molecule to best explain X-ray or electron microscopy observations. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This release was an update in response to enquiries from other software developers who wished to experiment with incorporating the software in their own software pipelines. 
URL http://fg.oisin.rc-harwell.ac.uk/projects/clipper-progs/
 
Title Wrapper code for GEMMI in CLIPPER library 
Description Some wrapper codes for conversion of GEMMI datatypes to CLIPPER datatypes. This is a unmerged branch of clipper hosted ccp4forge gitlab repository. 
Type Of Technology Software 
Year Produced 2025 
Open Source License? Yes  
Impact Used by a number of downstream projects including CCP4, CCP-EM, Moorhen 
URL https://zenodo.org/doi/10.5281/zenodo.14980102
 
Title paulsbond/modelcraft: v2.4.1 
Description Fixed Catching connection errors when requesting PDB entry contents Ensuring MTZ data items use the same ASU definition 
Type Of Technology Software 
Year Produced 2022 
Impact Software released through CCP4 
URL https://zenodo.org/record/6821716
 
Description CCP-EM Icknield Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited lecture and tutorial on automated de-novo model building into electron microscopy maps.
Year(s) Of Engagement Activity 2022
URL https://instruct-eric.org/events/2022-icknield-workshop-on-model-building-and-refinement/
 
Description CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps' Apr 2010 This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials. The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved.
Year(s) Of Engagement Activity 2019,2020
URL https://www.ccpem.ac.uk/training/icknield_2019/icknield_2019.php
 
Description From special interests to social anxiety: autism in academia 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Everyone has a unique brain and therefore different skills, abilities, and ways to contribute at work. In science there are many research challenges and ways to approach them, but how inclusive are we to neurodivergent scientists?
Year(s) Of Engagement Activity 2022
URL https://ncas.ac.uk/from-special-interests-to-social-anxiety-autism-in-academia/
 
Description Presentation at CCP-EM Spring Symposium 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited talk on Extending model building and refinement tools for Cryo-EM applications at the CCP4 symposium, Nottingham, Apr 2019
Year(s) Of Engagement Activity 2019
URL https://www.youtube.com/watch?v=evbJV6431EA