CCP4 Advanced integrated approaches to macromolecular structure determination
Lead Research Organisation:
University of York
Department Name: Chemistry
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Planned Impact
The generic importance of macromolecular crystallography in general and CCP4 in particular is provided in the Pathways to Impacts section.
CCP4 users in the pharmaceutical and biotechnology sector are most often involved in the study of protein-ligand (most often drug) complexes. The critical computational step in this process is molecular replacement (MR), in which a known atomic model from a similar structure is used to explain the diffraction pattern of the unknown structure. The MR approach is used in more than 70% of structure solutions. However it is not uncommon for the molecular replacement to yield a poor electron density map due to changes in the conformation of the protein. The software developed in this work package aims to significantly reduce the number of cases in which problems occur by increasing the range of convergence of the initial refinement of the MR model, while dramatically increasing the speed of the refinement step to allow screening many more candidate models.
Cryo-Electron Microscopy (EM) is an increasingly important method for the determination of the structure of pathogens and complexes. The same methods will also be implemented for cryo-EM data, where the resolution tolerance of the methods will facilitate the interpretation of lower resolution reconstructions.
Improvement of the protein model also improves the electron density for the unmodelled ligand or drug, since the electron density features of the known and unknown regions of the structure are related through the diffraction pattern. The speed and radius of convergence of the new method will increase the coverage of automated methods for high throughput screening, which are widely used in the commercial sector. The impact of these developments will be to reduce the number of cases where structure solutions fails, to reduce the level of manual intervention required in successful studies, and to increase the accuracy of the resulting structures.
YSBL has played a significant role in the commercial impact of CCP4: two YSBL-originated developments (the REFMAC and COOT software) have been the most-used tools in their field. Several other YSBL developments (DM, MOLREP, BUCCANEER, CCP4I) have citation counts in the hundreds to thousands are are significantly used in industry. The YSBL group engage with commercial customers through through commercial representation on the CCP4 Executive Committee and Working Groups 1 and 2, through workshops and the CCP4 bulletin board. CCP4 developers including the York group. The working groups provide guidance on which strategic planning is built.
The software produced will be added to the CCP4 suite, and where appropriate to the related CCP-EM software suite for electron microscopy. The CCP4 suite is in use world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to the overwhelming majority of macromolecular crystallographers. Libraries and methods will be available to other packages as well. CCP4 is updated with major version releases roughly every year, and automated updates on a roughly monthly basis to enable fast access to new developments. As a result, once the software has been added to the package it will within months be available to both the academic and commercial user community.
CCP4 users in the pharmaceutical and biotechnology sector are most often involved in the study of protein-ligand (most often drug) complexes. The critical computational step in this process is molecular replacement (MR), in which a known atomic model from a similar structure is used to explain the diffraction pattern of the unknown structure. The MR approach is used in more than 70% of structure solutions. However it is not uncommon for the molecular replacement to yield a poor electron density map due to changes in the conformation of the protein. The software developed in this work package aims to significantly reduce the number of cases in which problems occur by increasing the range of convergence of the initial refinement of the MR model, while dramatically increasing the speed of the refinement step to allow screening many more candidate models.
Cryo-Electron Microscopy (EM) is an increasingly important method for the determination of the structure of pathogens and complexes. The same methods will also be implemented for cryo-EM data, where the resolution tolerance of the methods will facilitate the interpretation of lower resolution reconstructions.
Improvement of the protein model also improves the electron density for the unmodelled ligand or drug, since the electron density features of the known and unknown regions of the structure are related through the diffraction pattern. The speed and radius of convergence of the new method will increase the coverage of automated methods for high throughput screening, which are widely used in the commercial sector. The impact of these developments will be to reduce the number of cases where structure solutions fails, to reduce the level of manual intervention required in successful studies, and to increase the accuracy of the resulting structures.
YSBL has played a significant role in the commercial impact of CCP4: two YSBL-originated developments (the REFMAC and COOT software) have been the most-used tools in their field. Several other YSBL developments (DM, MOLREP, BUCCANEER, CCP4I) have citation counts in the hundreds to thousands are are significantly used in industry. The YSBL group engage with commercial customers through through commercial representation on the CCP4 Executive Committee and Working Groups 1 and 2, through workshops and the CCP4 bulletin board. CCP4 developers including the York group. The working groups provide guidance on which strategic planning is built.
The software produced will be added to the CCP4 suite, and where appropriate to the related CCP-EM software suite for electron microscopy. The CCP4 suite is in use world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to the overwhelming majority of macromolecular crystallographers. Libraries and methods will be available to other packages as well. CCP4 is updated with major version releases roughly every year, and automated updates on a roughly monthly basis to enable fast access to new developments. As a result, once the software has been added to the package it will within months be available to both the academic and commercial user community.
Organisations
Publications
Alharbi E
(2023)
Buccaneer model building with neural network fragment selection.
in Acta crystallographica. Section D, Structural biology
Krissinel E
(2022)
CCP4 Cloud for structure determination and project management in macromolecular crystallography.
in Acta crystallographica. Section D, Structural biology
Alharbi E
(2019)
Comparison of automated crystallographic model-building pipelines
in Acta Crystallographica Section D Structural Biology
Lawson CL
(2021)
Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challenge.
in Nature methods
Bond PS
(2022)
ModelCraft: an advanced automated model-building pipeline using Buccaneer.
in Acta crystallographica. Section D, Structural biology
Dialpuri JS
(2024)
NucleoFind: a deep-learning network for interpreting nucleic acid electron density.
in Nucleic acids research
Dialpuri JS
(2024)
Online carbohydrate 3D structure validation with the Privateer web app.
in Acta crystallographica. Section F, Structural biology communications
Lawson C
(2024)
Outcomes of the EMDataResource cryo-EM Ligand Modeling Challenge
in Nature Methods
Lawson CL
(2024)
Outcomes of the EMDataResource Cryo-EM Ligand Modeling Challenge.
in Research square
Alharbi E
(2020)
Pairwise running of automated crystallographic model-building pipelines.
in Acta crystallographica. Section D, Structural biology
Bond PS
(2020)
Predicting protein model correctness in Coot using machine learning.
in Acta crystallographica. Section D, Structural biology
Alharbi E
(2021)
Predicting the performance of automated crystallographic model-building pipelines.
in Acta crystallographica. Section D, Structural biology
Cowtan K
(2020)
Shift-field refinement of macromolecular atomic models.
in Acta crystallographica. Section D, Structural biology
Cowtan K
(2020)
Structural barriers to scientific progress.
in Acta crystallographica. Section D, Structural biology
Agirre J
(2023)
The CCP 4 suite: integrative software for macromolecular crystallography
in Acta Crystallographica Section D Structural Biology
| Description | The project consists of four deliverables. Progress against each of these is reported below: Deliverable 1: To generalise and optimise shift field methods for wider applicability against multiple data sources. ? COMPLETED ? The shift field refinement software has been adapted for electron microscopy use. It has also been extended to allow refinement of maps rather than models. This novel approach allows a range of new applications not supported by existing methods, including refinement of EM maps to phase cystallographic data, and non-rigid refinement of NCS domains. Deliverable 2: To implement new resolution independent model building framework in BUCCANEER and NAUTILUS. ? IN PROGRESS ? Paul Bond has demonstrated a new machine learning approach to feature interpretation in electron density maps for model building using GPUs. We are beginning to test this to evaluate its impact on model building. ? Paul Bond has developed a new machine learning approach to the growing of protein chains using a neural network to identify likely chain conformations. Results suggest significantly improved performance at low resolutions. The method is currently being retrained against a larger dataset. We are evaluating how to build this work into a future release. Deliverable 3: To develop new stereochemical regularisation software library to enable critical speed improvements. ? COMPLETED ? Two new high performance regularization algorithms have been implemented. The first, described as a pseudo-regularizer, moves overlapping fragments of the input model onto the refined model to restore geometry. This is instantaneous. The second, a general purpose regularizer, is fast but not instantaneous. This will have application in future work to provide web and cloud based software for model building and refinement. Deliverable 4: To implement contact predictions developed in WP3 within BUCCANEER and NAUTILUS To implement contact predictions developed in WP3 within BUCCANEER and NAUTILUS. IMPLEMENTED BUT DATA ARGUED AGAINST PURSUING THIS LINE OF ENQUIRY IN OUR CURRENT SOFTWARE. COMMUNITY NEEDS IN THIS AREA LARGELY MITIGATED BY ALPHAFOLD. The York component of this grant started part of the way through the grant due to dependencies on other parts of the grant. The PDRA started during 12/2020. PI time for K Cowtan has been allocated from 04/2019, and Co-investigator J Agirre has also committed work to the project, funded by his Royal Society fellowship. Progress has been as follows: ? The shift-field refinement software has been further developed to address the full set of applications (coordinate, isotropic B-factor and anisotropic B-factor refinement), and source code has been released into the public CCP4 software repository. The software has been incorporated into both molecular replacement and model building pipelines in the CCP4i2 graphical user interface and in the CCP4 Cloud graphical user interface. ? A further paper on the method has been published: Cowtan, K., Metcalfe, S., & Bond, P. (2020). Shift-field refinement of macromolecular atomic models. Acta Crystallographica Section D: Structural Biology, 76(12). This paper presents results of coordinate and anisotropic B-factor refinement, and includes the complete test data and code to reproduce the results from the paper. The software shows comparable and somewhat complementary results to existing methods, but is 1-2 orders of magnitude faster for the preliminary stages of model refinement. ? The shift field refinement software has been adapted for electron microscopy use. It has also been extended to allow refinement of maps rather than models. This novel approach allows a range of new applications not supported by existing methods, including refinement of EM maps to phase crystallographic data, and non-rigid refinement of NCS domains. ? A new fast approach to regularization at low resolution has been implemented which allows shift field refinement to be run for more cycles without disrupting model connectivity. This has led to further improvement A library for fast conventional regularization of bonds, angles, torsions and planes using Gemmi has also been implemented that takes into account per-atom coordinate errors, including fixed atoms that are not included in the minimisation for speed. Attempts to include this library in shift-field refinement have not yet improved on the non-conventional regularisation method mentioned above. ? A new automated model-building pipeline has been developed that expands on the existing pipeline by including shift-field refinement, model pruning and density modification. This can drastically improve completeness of the built models, especially when starting from poor molecular replacement solutions. This has been published in: Bond, P. S., & Cowtan, K. D. (2022). ModelCraft: an advanced automated model-building pipeline using Buccaneer. Acta Crystallographica Section D: Structural Biology, 78(9). ? K Cowtan and a P Bond have made substantial progress on model finalization through developments both of model building and refinement methods. This includes the development of regularization tools which will be required for WP2. These features have been released with the latest version of the CCP4 software suite (version 8.0). ? P Bond has been developing approaches for the application of machine learning to improve different stages of the model building pipeline, including preprocessing the electron density map, identification of initial structural fragments, and growing of protein chains. ? K Cowtan has implemented proof of concept code for the use of distogram data based on contact information from alphafold. This code currently being tuned and a framework for evaluation is being established. ? J Agirre and a student have been updating key software infrastructure components which will be used in the project, including the Clipper-python interface library. These have been updated to the current C++11 and Python 3 standards. We have therefore made progress on all four goals set out in the grant and as well as in some additional areas not originally planned. |
| Exploitation Route | We have been exploring a potential collaboration with Maya Topf's Flex-EM group in London to extend their software using the methods developed here on the basis of the new results described above. That has lead to a responsive mode BBSRC grant to explore this application. |
| Sectors | Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
| Description | Impact ? Driving the ED&I agenda and appointing an ED&I team ? Organising online events - Study Weekend 2022 online was a very successful meeting with 1,088 participants! ? Most successful year for working group 2 meetings with higher attendance numbers (generally over 45) ? Initial planning for a new working group 2 meeting schedule with more, more inclusive, and shorter meetings. ? Produce new public-facing web pages for the project with modern presentation and improved accessibility. ? Starting the CCP4 Documentation project to make outputs more accessible. 2 publications have also been associated with the impact part of the project under Ivo Tews. |
| First Year Of Impact | 2021 |
| Sector | Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
| Impact Types | Societal |
| Title | Comparison of automated crystallographic model building pipelines |
| Description | Supporting test data, computer code and results for the paper "Comparison of automated crystallographic model building pipelines" |
| Type Of Material | Database/Collection of data |
| Year Produced | 2019 |
| Provided To Others? | Yes |
| URL | https://pure.york.ac.uk/portal/en/datasets/d4cb35df-a42d-4365-b539-9868730d165f |
| Title | Pairwise running of automated crystallographic model-building pipelines |
| Description | Data and methods for paper "Pairwise running of automated crystallographic model-building pipelines" submitted to Acta Crystallography D |
| Type Of Material | Database/Collection of data |
| Year Produced | 2020 |
| Provided To Others? | Yes |
| URL | https://pure.york.ac.uk/portal/en/datasets/4b7c880a-d6b0-471a-a379-d52c4ee947fe |
| Title | Sheetbend software for model morphing with non-atomic parameterizations. |
| Description | Software for optimizing a 3D model of a biological molecule to best explain X-ray or electron microscopy observations. |
| Type Of Technology | Software |
| Year Produced | 2019 |
| Open Source License? | Yes |
| Impact | This release was an update in response to enquiries from other software developers who wished to experiment with incorporating the software in their own software pipelines. |
| URL | http://fg.oisin.rc-harwell.ac.uk/projects/clipper-progs/ |
| Title | Wrapper code for GEMMI in CLIPPER library |
| Description | Some wrapper codes for conversion of GEMMI datatypes to CLIPPER datatypes. This is a unmerged branch of clipper hosted ccp4forge gitlab repository. |
| Type Of Technology | Software |
| Year Produced | 2025 |
| Open Source License? | Yes |
| Impact | Used by a number of downstream projects including CCP4, CCP-EM, Moorhen |
| URL | https://zenodo.org/doi/10.5281/zenodo.14980102 |
| Title | paulsbond/modelcraft: v2.4.1 |
| Description | Fixed Catching connection errors when requesting PDB entry contents Ensuring MTZ data items use the same ASU definition |
| Type Of Technology | Software |
| Year Produced | 2022 |
| Impact | Software released through CCP4 |
| URL | https://zenodo.org/record/6821716 |
| Description | CCP-EM Icknield Workshop |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Invited lecture and tutorial on automated de-novo model building into electron microscopy maps. |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://instruct-eric.org/events/2022-icknield-workshop-on-model-building-and-refinement/ |
| Description | CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps' Apr 2010 This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials. The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved. |
| Year(s) Of Engagement Activity | 2019,2020 |
| URL | https://www.ccpem.ac.uk/training/icknield_2019/icknield_2019.php |
| Description | From special interests to social anxiety: autism in academia |
| Form Of Engagement Activity | A magazine, newsletter or online publication |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Everyone has a unique brain and therefore different skills, abilities, and ways to contribute at work. In science there are many research challenges and ways to approach them, but how inclusive are we to neurodivergent scientists? |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://ncas.ac.uk/from-special-interests-to-social-anxiety-autism-in-academia/ |
| Description | Presentation at CCP-EM Spring Symposium 2019 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Invited talk on Extending model building and refinement tools for Cryo-EM applications at the CCP4 symposium, Nottingham, Apr 2019 |
| Year(s) Of Engagement Activity | 2019 |
| URL | https://www.youtube.com/watch?v=evbJV6431EA |
