CCP4 Grant Renewal 2014-2019: Question-driven crystallographic data collection and advanced structure solution
Lead Research Organisation:
University of York
Department Name: Chemistry
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
This proposal incorporates five related work packages.
In WP1 we will track synchrotron-collected data through computational structure determination, to find whether the most useful data can be recognised a priori using established or novel metrics of data quality and consistency. We will then enable data collection software to communicate with pipelines and graphics programs to assess when sufficient data have been collected for a given scientific question, and so to prioritise further beamtime usage. We will also communicate extra information about diffraction data to structure determination programs, and so support the statistical models and algorithms being developed in WP4.
WP2 will improve the key MR step of model preparation, especially from diverged, NMR, or ab initio models. One development will be to extend the size limit of ab initio search model generation by exploiting sequence covariance algorithms.
In WP3 we will use our description of electron density maps as a field of control points to better use electron density or atomic models positioned by MR. Restrained manipulation of these points provides a low-order parameterisation of refinement decoupled from atomic models, and therefore suitable for highly diverged atomic models or EM-derived maps. We will extend this approach to characterise local protein mobility without the requirement of TLS for predefinition of rigid groups.
In WP4 we will statistically model non-idealities in experimental data, including non isomorphism, spot overlap, and radiation damage. The resulting models, implemented in REFMAC, will be applied to refinement using data that are annotated by WP1 tools and tracked by WP0.
WP0 will provide the tools to integrate the other WPs. For this, it will create a cloud environment where storage- and compute-resources can be utilised optimally, and where rich information can be passed among beamlines, pipelines, and graphics programs.
In WP1 we will track synchrotron-collected data through computational structure determination, to find whether the most useful data can be recognised a priori using established or novel metrics of data quality and consistency. We will then enable data collection software to communicate with pipelines and graphics programs to assess when sufficient data have been collected for a given scientific question, and so to prioritise further beamtime usage. We will also communicate extra information about diffraction data to structure determination programs, and so support the statistical models and algorithms being developed in WP4.
WP2 will improve the key MR step of model preparation, especially from diverged, NMR, or ab initio models. One development will be to extend the size limit of ab initio search model generation by exploiting sequence covariance algorithms.
In WP3 we will use our description of electron density maps as a field of control points to better use electron density or atomic models positioned by MR. Restrained manipulation of these points provides a low-order parameterisation of refinement decoupled from atomic models, and therefore suitable for highly diverged atomic models or EM-derived maps. We will extend this approach to characterise local protein mobility without the requirement of TLS for predefinition of rigid groups.
In WP4 we will statistically model non-idealities in experimental data, including non isomorphism, spot overlap, and radiation damage. The resulting models, implemented in REFMAC, will be applied to refinement using data that are annotated by WP1 tools and tracked by WP0.
WP0 will provide the tools to integrate the other WPs. For this, it will create a cloud environment where storage- and compute-resources can be utilised optimally, and where rich information can be passed among beamlines, pipelines, and graphics programs.
Planned Impact
The generic importance of macromolecular crystallography in general and CCP4 in particular is provided in the Pathways to Impacts section.
CCP4 users in the pharmaceutical and biotechnology sector are most often involved in the study of protein-ligand (most often drug) complexes. The critical computational step in this process is molecular replacement (MR), in which a known atomic model from a similar structure is used to explain the diffraction pattern of the unknown structure. The MR approach is used in more than 70% of structure solutions. However it is not uncommon for the molecular replacement to yield a poor electron density map due to changes in the conformation of the protein. The software developed in work package 3 aims to significantly reduce the number of cases in which problems occur by introducing an additional flexible fitting step between the molecular replacement and refinement steps. The same approach will be applied to related problems, including the use of cryo-EM data to interpret the X-ray diffraction pattern.
Improvement of the protein model also improves the electron density for the unmodeled ligand or drug, since the electron density features of the known and unknown regions of the structure are related through the diffraction pattern. Provision of an additional, highly automated refinement step in this process will therefore increase the coverage of automated methods for high throughput screening, which are widely used in the commercial sector. The impact of these developments will be to reduce the number of cases where structure solutions fails, to reduce the level of manual intervention required in successful studies, and to increase the accuracy of the resulting structures.
YSBL has played a significant role in the commercial impact of CCP4, with two YSBL-originated developments (the REFMAC and COOT software) being the most-used tools in their field. The YSBL group engage with commercial customers through through commercial representation on the CCP4 Executive Committee and Working Group 1, through workshops and the CCP4 bulletin board. CCP4 developers including the York group, conduct an annual meeting with structural biologists at GSK to guide future developments, and plans and developments are presented at this meeting. There are occasional visits to other customers.
The resulting software will be added to the CCP4 suite. This package is in use world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to the vast majority of macromolecular crystallographers. CCP4 is updated with major version releases roughly every year, and had recently introduced an automated update mechanism to enable faster access to new developments. As a result, once the software has been added to the package it will within weeks to months be available to both the academic and commercial user community.
CCP4 users in the pharmaceutical and biotechnology sector are most often involved in the study of protein-ligand (most often drug) complexes. The critical computational step in this process is molecular replacement (MR), in which a known atomic model from a similar structure is used to explain the diffraction pattern of the unknown structure. The MR approach is used in more than 70% of structure solutions. However it is not uncommon for the molecular replacement to yield a poor electron density map due to changes in the conformation of the protein. The software developed in work package 3 aims to significantly reduce the number of cases in which problems occur by introducing an additional flexible fitting step between the molecular replacement and refinement steps. The same approach will be applied to related problems, including the use of cryo-EM data to interpret the X-ray diffraction pattern.
Improvement of the protein model also improves the electron density for the unmodeled ligand or drug, since the electron density features of the known and unknown regions of the structure are related through the diffraction pattern. Provision of an additional, highly automated refinement step in this process will therefore increase the coverage of automated methods for high throughput screening, which are widely used in the commercial sector. The impact of these developments will be to reduce the number of cases where structure solutions fails, to reduce the level of manual intervention required in successful studies, and to increase the accuracy of the resulting structures.
YSBL has played a significant role in the commercial impact of CCP4, with two YSBL-originated developments (the REFMAC and COOT software) being the most-used tools in their field. The YSBL group engage with commercial customers through through commercial representation on the CCP4 Executive Committee and Working Group 1, through workshops and the CCP4 bulletin board. CCP4 developers including the York group, conduct an annual meeting with structural biologists at GSK to guide future developments, and plans and developments are presented at this meeting. There are occasional visits to other customers.
The resulting software will be added to the CCP4 suite. This package is in use world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to the vast majority of macromolecular crystallographers. CCP4 is updated with major version releases roughly every year, and had recently introduced an automated update mechanism to enable faster access to new developments. As a result, once the software has been added to the package it will within weeks to months be available to both the academic and commercial user community.
Organisations
People |
ORCID iD |
Kevin Cowtan (Principal Investigator) | |
Keith Wilson (Co-Investigator) |
Publications
McNicholas S
(2018)
Automating tasks in protein structure determination with the clipper python module.
in Protein science : a publication of the Protein Society
Lawson CL
(2021)
Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challenge.
in Nature methods
Alharbi E
(2021)
Predicting the performance of automated crystallographic model-building pipelines.
in Acta crystallographica. Section D, Structural biology
Potterton L
(2018)
CCP4i2: the new graphical user interface to the CCP4 program suite.
in Acta crystallographica. Section D, Structural biology
Agirre J
(2023)
The CCP4 suite: integrative software for macromolecular crystallography.
in Acta crystallographica. Section D, Structural biology
Cowtan K
(2018)
Macromolecular refinement by model morphing using non-atomic parameterizations.
in Acta crystallographica. Section D, Structural biology
Cowtan K
(2020)
Shift-field refinement of macromolecular atomic models
in Acta Crystallographica Section D Structural Biology
Description | Dr Jon Agirre was appointed to work on the York work package of the CCP4 grant on 1st April 2015. In the first 21 months of the grant we have been building a substantial computational infrastructure as well as developing and implementing the mathematical frameworks for model free refinement. The proof-of-concept control-point software developed by Dr Cowtan has been modified for application to real molecular replacement data rather than the original synthetic data. The software has been applied to a problem molecular replacement structure - in which phase information is not available, as well as to test data where phases are available - the latter case being more representative of the application to electron microscopy data. The software is performing effectively when phases are available, for all parts of the structure where there are significant electron density features. The software was instrumented to provide visual diagnostics in the Coot graphics package, which revealed a problem which arises when a convex hull surrounding the structure contains enclosed solvent channels. This will be addressed by pruning uninformative control points. Tests on a problem molecular replacement structure which shows significant domain motion were unsuccessful. The principal obstacle in this case was the quality of the electron density reconstruction from the model: the domain motion was sufficient to cause the moving regions to display either misleading or no electron density. However it may be that this dataset represents an unrealistic challenge. We are working on addressing this problem by a three-fold strategy: - Improvements to the search target function to better detect the appropriate shift to apply to a given region of the search model. - The control point software will be applied to the problem of generating an ensemble of permutations on the search structure.he information from the model ensemble will then be combined to produce a bias-reduced map against which to perform control point refinement of the coordinates of the search model. - A database of molecular replacement test structures will be prepared to enable a more effective evaluation of the performance of the algorithm. In addition we have developed a python interface to the clipper libraries to enable more rapid development of the required algorithms. We have also been working on the CCP4i2 software framework for implementing and linking the software tools for control point refinement with supporting tools from the CCP4 software suite. Since mid 2016 we have now developed a second approach to model-free refinement, with different strengths and limitations, which does not involve control points at all. Instead, a spatially complete field of parameter shifts is determined, which may include isotropic displacement parameters, anisotropic displacement parameters, and positional parameters. The new approach is much simpler, and has been demonstrated on real data for isotropic displacement parameters, and on synthetic data for anisotropic displacement parameters. An implementation for coordinates is in progress. The simplicity of the new approach gives us a strong expectation of releasing a user-oriented software package within the next 12 months. In 2017 Dr Agirre was awarded a Royal Society University Research Fellowship, and a new PDRA, Dr Stephen Metcalfe, recruited to continue the work. While this transition incurred a significant cost in terms of training, we have made substantial progress since the last report. The shift-field approach has been formalised and reported at the CCP4 study weekend and a paper. The required an initial implementation of the method for the refinement of isotropic thermal parameters, which allowed the method to be validated and performance investigated, as well as providing a sanity check of the theory. Subsequent to publication we have been working in parallel on the refinement of atomic coordinates, and on the refinement of anisotropic thermal parameters. Anisotropic thermal parameter refinement appears to be possible, although we have not yet determined the limitations of the method. Coordinate refinement has been demonstrated at data resolutions much poorer than are required for traditional refinement methods, and with a radius of convergence which is comparable to or occasionally exceeds the best existing methods. In addition, the new method can be 1-2 orders of magnitude faster than traditional methods (due to working at lower resolutions). This opens up the possibility of new structure solution methods which the computational cost of refinement previously rendered impractical. Since Feb 2018 we have implemented the new refinement method in a piece of software, 'sheetbend', which has been published in a paper and released to users through the CCP4 source repository; it will also become available as part of the CCP4 software suite at the next release. The preliminary release version performed coordinate refinement and isotropic B-factor refinement, which was then extended by Stephan Metcalfe to implement anisotropic B-factor refinement. This version has been further optimised by K Cowtan for increased computational performance and to increase radius of convergence. The software is currently available publically through the CCP4 source code repository. |
Exploitation Route | We expect the new model-free refinement software to be adopted by the electron microscopy community in addition to the crystallography community. We will contribute the software to the CCP-EM package for this purpose, through which it will be distributed to biotech users in both academic and commercial sectors. We have also identified a valuable collaboration with the Flex-EM group to further extend this work in the context of cryo-EM, and have a BBSRC responsive mode grant in review at the moment for this collaboration. |
Sectors | Pharmaceuticals and Medical Biotechnology |
URL | http://fg.oisin.rc-harwell.ac.uk/projects/clipper-progs/ |
Description | The software and methods we have developed have been contributed (along with other software) to the CCP4 and CCP-EM software suites which are licensed to industrial users at over 100 sites worldwide, raising a license income of over £1m/year. We engage directly with industrial users through the CCP4 working group 1 and CCP4 and CCP-EM annual symposia, through workshops, as well as on an ad-hoc basis in relation to individual problems. Particular developments arising from this project include extension of our existing model building methods to larger structures and to the further automation of improving models for deposition. |
First Year Of Impact | 2019 |
Sector | Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Title | Macromolecular refinement by model morphing using non-atomic parameterizations |
Description | Methods and data to reproduce the results of the paper "Macromolecular refinement by model morphing using non-atomic parameterizations", submitted to Acta Crystallographica volume D |
Type Of Material | Database/Collection of data |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | The dataset has generated two requests from a user wishing to apply the methods. (Users may also accessing the data or code without contacting us.) |
URL | https://pure.york.ac.uk/portal/en/datasets/macromolecular-refinement-by-model-morphing-using-nonatom... |
Title | Buccaneer version 1.6.1 protein model building software |
Description | Buccaneer is an automated protein model building program. It features robust handling of limited data resolution, and is competitive in terms of speed. It is particularly useful at resolutions of worse than 2.5A, although it can also be used at high resolution. The latest version includes methods for de-biasing molecular replacement models, arising from grant BB/L006383/1 |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | N/A |
URL | http://www.ccp4.ac.uk/download/ |
Title | Buccaneer version 1.6.3 protein model building software |
Description | Buccaneer is an automated protein model building program. It features robust handling of limited data resolution, and is competitive in terms of speed. It is particularly useful at resolutions of worse than 2.5A, although it can also be used at high resolution. The latest version includes methods for handling large structures and for bringing the final model closer to completion. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | N/A |
URL | http://www.ccp4.ac.uk/download/ |
Title | Clipper-python |
Description | Clipper-python is a python interface to the 'clipper' C++ libraries for X-ray crystallographic computation. It enable much more rapid application development and testing by making clipper functionality available through the python programming language. The software has been included in version 7.0 of the CCP4 software suite. |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | The parallel CCP-EM project for Electron Microscopy has expressed interest in using and distributing the software. |
URL | http://www.ccp4.ac.uk/download/ |
Title | Clipper-python |
Description | Clipper-python is a python interface to the 'clipper' C++ libraries for X-ray crystallographic computation. It enable much more rapid application development and testing by making clipper functionality available through the python programming language. The software has been included in version 7.0 of the CCP4 software suite. The software has now been expanded to allow rapid access to and manipulation of large crystallographic data object by using the tools in python/numpy. The new version has been released by CCP4 and is also now in the CCP-EM source tree for inclusion in their next release. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | The library has been developed to facilitate the implementation of the refinement methods being developed on this grant, however its utility has also become apparent at a summer school in which students used it to write their own crystallographic software from scratch, as well as by its adoption by the CCP-EM project. |
URL | https://fg.oisin.rc-harwell.ac.uk/projects/clipper-python/ |
Title | Clipper-tools |
Description | This is a companion module that gets distributed alongside clipper-python. Jon Agirre has developed simple io functions with logfile & XML reporting (different behaviours available using callbacks) for native integration with i2. [released by CCP4, in CCPem source tree] Within clipper_tools: em.cut_density - cuts a part of a cryoEM map using a mask computed from a supplied model, applies sharpening or blurring, computes map coefficients and produces a mini MTZ file and XML/logfile results. This will be employed by future versions of the phaser pipeline for molecular replacement with EM maps. [committed, available in both source trees]. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | N/A |
URL | https://fg.oisin.rc-harwell.ac.uk/projects/clipper-python/ |
Title | Sheetbend software for model morphing with non-atomic parameterizations. |
Description | Software for optimizing a 3D model of a biological molecule to best explain X-ray or electron microscopy observations. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | Enquiries from multiple users about application of the software to their problems. |
URL | https://pure.york.ac.uk/portal/en/publications/sheetbend-software-for-model-morphing-of-atomic-model... |
Title | Sheetbend software for model morphing with non-atomic parameterizations. |
Description | Software for optimizing a 3D model of a biological molecule to best explain X-ray or electron microscopy observations. |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | This release was an update in response to enquiries from other software developers who wished to experiment with incorporating the software in their own software pipelines. |
URL | http://fg.oisin.rc-harwell.ac.uk/projects/clipper-progs/ |
Description | CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps' 2nd - 4th March 2016 This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials. The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.ccpem.ac.uk/training/icknield_2016/icknield_schedule.pdf |
Description | CCP4 Study Weekend 2017: From Data to Structure |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | This year's CCP4 study weekend focused on providing an overview of the process and pipelines available, to take crystallographic diffraction data from spot intensities right through to structure. Therefore sessions included; processing diffraction data, phasing through molecular replacement and experimental techniques, automated model building and refinement. As well as updates to CCP4 and where is crystallography going to take us in the future? 400 practitioners from the field attended. I presented a talk on current methods for automated structure solution, as well as our new approach to model free refinement by determination of parameter shift fields. |
Year(s) Of Engagement Activity | 2017 |
URL | http://www.ebi.ac.uk/pdbe/about/events/ccp4-study-weekend-2017 |
Description | DLS-CCP4 Data Collection and Structure Solution Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | DLS-CCP4 Data Collection and Structure Solution Workshop Delivered three 45 minute lectures on phase improvement, model building and carbohydrates. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.ccp4.ac.uk/schools/DLS-2015/ |
Description | Invited talk at Barcelona meeting on "MX and cryo-EM" |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Invited talk by Jon Agirre and Scott Hoh on "Automated model building in cryoEM maps" |
Year(s) Of Engagement Activity | 2017 |
URL | https://sbu.csic.es/conference-mx-cryoem-bcn/ |
Description | Presentation at CCP-EM Spring Symposium 2019 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Invited talk on Extending model building and refinement tools for Cryo-EM applications at the CCP4 symposium, Nottingham, Apr 2019 |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.youtube.com/watch?v=evbJV6431EA |
Description | Presentation at CCP4 study weekend |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Talk on model morphing with non-atomic representations at the CCP4 study weekend, Nottingham, Jan 2019 |
Year(s) Of Engagement Activity | 2019 |
URL | http://www.cvent.com/events/ccp4-study-weekend-2019/agenda-3372f50a47c74742afc6e001881e38de.aspx?dvc... |