CCP4 Grant Renewal 2014-2019: Question-driven crystallographic data collection and advanced structure solution

Lead Research Organisation: University of York
Department Name: Chemistry

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This proposal incorporates five related work packages.

In WP1 we will track synchrotron-collected data through computational structure determination, to find whether the most useful data can be recognised a priori using established or novel metrics of data quality and consistency. We will then enable data collection software to communicate with pipelines and graphics programs to assess when sufficient data have been collected for a given scientific question, and so to prioritise further beamtime usage. We will also communicate extra information about diffraction data to structure determination programs, and so support the statistical models and algorithms being developed in WP4.

WP2 will improve the key MR step of model preparation, especially from diverged, NMR, or ab initio models. One development will be to extend the size limit of ab initio search model generation by exploiting sequence covariance algorithms.

In WP3 we will use our description of electron density maps as a field of control points to better use electron density or atomic models positioned by MR. Restrained manipulation of these points provides a low-order parameterisation of refinement decoupled from atomic models, and therefore suitable for highly diverged atomic models or EM-derived maps. We will extend this approach to characterise local protein mobility without the requirement of TLS for predefinition of rigid groups.

In WP4 we will statistically model non-idealities in experimental data, including non isomorphism, spot overlap, and radiation damage. The resulting models, implemented in REFMAC, will be applied to refinement using data that are annotated by WP1 tools and tracked by WP0.

WP0 will provide the tools to integrate the other WPs. For this, it will create a cloud environment where storage- and compute-resources can be utilised optimally, and where rich information can be passed among beamlines, pipelines, and graphics programs.

Planned Impact

The generic importance of macromolecular crystallography in general and CCP4 in particular is provided in the Pathways to Impacts section.

CCP4 users in the pharmaceutical and biotechnology sector are most often involved in the study of protein-ligand (most often drug) complexes. The critical computational step in this process is molecular replacement (MR), in which a known atomic model from a similar structure is used to explain the diffraction pattern of the unknown structure. The MR approach is used in more than 70% of structure solutions. However it is not uncommon for the molecular replacement to yield a poor electron density map due to changes in the conformation of the protein. The software developed in work package 3 aims to significantly reduce the number of cases in which problems occur by introducing an additional flexible fitting step between the molecular replacement and refinement steps. The same approach will be applied to related problems, including the use of cryo-EM data to interpret the X-ray diffraction pattern.

Improvement of the protein model also improves the electron density for the unmodeled ligand or drug, since the electron density features of the known and unknown regions of the structure are related through the diffraction pattern. Provision of an additional, highly automated refinement step in this process will therefore increase the coverage of automated methods for high throughput screening, which are widely used in the commercial sector. The impact of these developments will be to reduce the number of cases where structure solutions fails, to reduce the level of manual intervention required in successful studies, and to increase the accuracy of the resulting structures.

YSBL has played a significant role in the commercial impact of CCP4, with two YSBL-originated developments (the REFMAC and COOT software) being the most-used tools in their field. The YSBL group engage with commercial customers through through commercial representation on the CCP4 Executive Committee and Working Group 1, through workshops and the CCP4 bulletin board. CCP4 developers including the York group, conduct an annual meeting with structural biologists at GSK to guide future developments, and plans and developments are presented at this meeting. There are occasional visits to other customers.

The resulting software will be added to the CCP4 suite. This package is in use world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to the vast majority of macromolecular crystallographers. CCP4 is updated with major version releases roughly every year, and had recently introduced an automated update mechanism to enable faster access to new developments. As a result, once the software has been added to the package it will within weeks to months be available to both the academic and commercial user community.

Publications

10 25 50
publication icon
Alharbi E (2021) Predicting the performance of automated crystallographic model-building pipelines. in Acta crystallographica. Section D, Structural biology

publication icon
Cowtan K (2018) Macromolecular refinement by model morphing using non-atomic parameterizations. in Acta crystallographica. Section D, Structural biology

publication icon
Cowtan K (2020) Shift-field refinement of macromolecular atomic models in Acta Crystallographica Section D Structural Biology

publication icon
McNicholas S (2018) Automating tasks in protein structure determination with the clipper python module. in Protein science : a publication of the Protein Society

publication icon
Potterton L (2018) CCP4i2: the new graphical user interface to the CCP4 program suite. in Acta crystallographica. Section D, Structural biology

 
Description Dr Jon Agirre was appointed to work on the York work package of the CCP4 grant on 1st April 2015. In the first 21 months of the grant we have been building a substantial computational infrastructure as well as developing and implementing the mathematical frameworks for model free refinement.

The proof-of-concept control-point software developed by Dr Cowtan has been modified for application to real molecular replacement data rather than the original synthetic data. The software has been applied to a problem molecular replacement structure - in which phase information is not available, as well as to test data where phases are available - the latter case being more representative of the application to electron microscopy data. The software is performing effectively when phases are available, for all parts of the structure where there are significant electron density features. The software was instrumented to provide visual diagnostics in the Coot graphics package, which revealed a problem which arises when a convex hull surrounding the structure contains enclosed solvent channels. This will be addressed by pruning uninformative control points.

Tests on a problem molecular replacement structure which shows significant domain motion were unsuccessful. The principal obstacle in this case was the quality of the electron density reconstruction from the model: the domain motion was sufficient to cause the moving regions to display either misleading or no electron density. However it may be that this dataset represents an unrealistic challenge. We are working on addressing this problem by a three-fold strategy:

- Improvements to the search target function to better detect the appropriate shift to apply to a given region of the search model.
- The control point software will be applied to the problem of generating an ensemble of permutations on the search structure.he information from the model ensemble will then be combined to produce a bias-reduced map against which to perform control point refinement of the coordinates of the search model.
- A database of molecular replacement test structures will be prepared to enable a more effective evaluation of the performance of the algorithm.

In addition we have developed a python interface to the clipper libraries to enable more rapid development of the required algorithms. We have also been working on the CCP4i2 software framework for implementing and linking the software tools for control point refinement with supporting tools from the CCP4 software suite.

Since mid 2016 we have now developed a second approach to model-free refinement, with different strengths and limitations, which does not involve control points at all. Instead, a spatially complete field of parameter shifts is determined, which may include isotropic displacement parameters, anisotropic displacement parameters, and positional parameters. The new approach is much simpler, and has been demonstrated on real data for isotropic displacement parameters, and on synthetic data for anisotropic displacement parameters. An implementation for coordinates is in progress. The simplicity of the new approach gives us a strong expectation of releasing a user-oriented software package within the next 12 months.

In 2017 Dr Agirre was awarded a Royal Society University Research Fellowship, and a new PDRA, Dr Stephen Metcalfe, recruited to continue the work. While this transition incurred a significant cost in terms of training, we have made substantial progress since the last report. The shift-field approach has been formalised and reported at the CCP4 study weekend and a paper. The required an initial implementation of the method for the refinement of isotropic thermal parameters, which allowed the method to be validated and performance investigated, as well as providing a sanity check of the theory. Subsequent to publication we have been working in parallel on the refinement of atomic coordinates, and on the refinement of anisotropic thermal parameters. Anisotropic thermal parameter refinement appears to be possible, although we have not yet determined the limitations of the method. Coordinate refinement has been demonstrated at data resolutions much poorer than are required for traditional refinement methods, and with a radius of convergence which is comparable to or occasionally exceeds the best existing methods. In addition, the new method can be 1-2 orders of magnitude faster than traditional methods (due to working at lower resolutions). This opens up the possibility of new structure solution methods which the computational cost of refinement previously rendered impractical.

Since Feb 2018 we have implemented the new refinement method in a piece of software, 'sheetbend', which has been published in a paper and released to users through the CCP4 source repository; it will also become available as part of the CCP4 software suite at the next release. The preliminary release version performed coordinate refinement and isotropic B-factor refinement, which was then extended by Stephan Metcalfe to implement anisotropic B-factor refinement. This version has been further optimised by K Cowtan for increased computational performance and to increase radius of convergence. The software is currently available publically through the CCP4 source code repository.
Exploitation Route We expect the new model-free refinement software to be adopted by the electron microscopy community in addition to the crystallography community. We will contribute the software to the CCP-EM package for this purpose, through which it will be distributed to biotech users in both academic and commercial sectors. We have also identified a valuable collaboration with the Flex-EM group to further extend this work in the context of cryo-EM, and have a BBSRC responsive mode grant in review at the moment for this collaboration.
Sectors Pharmaceuticals and Medical Biotechnology

URL http://fg.oisin.rc-harwell.ac.uk/projects/clipper-progs/
 
Description The software and methods we have developed have been contributed (along with other software) to the CCP4 and CCP-EM software suites which are licensed to industrial users at over 100 sites worldwide, raising a license income of over £1m/year. We engage directly with industrial users through the CCP4 working group 1 and CCP4 and CCP-EM annual symposia, through workshops, as well as on an ad-hoc basis in relation to individual problems. Particular developments arising from this project include extension of our existing model building methods to larger structures and to the further automation of improving models for deposition.
First Year Of Impact 2019
Sector Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title Macromolecular refinement by model morphing using non-atomic parameterizations 
Description Methods and data to reproduce the results of the paper "Macromolecular refinement by model morphing using non-atomic parameterizations", submitted to Acta Crystallographica volume D 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact The dataset has generated two requests from a user wishing to apply the methods. (Users may also accessing the data or code without contacting us.) 
URL https://pure.york.ac.uk/portal/en/datasets/macromolecular-refinement-by-model-morphing-using-nonatom...
 
Title Buccaneer version 1.6.1 protein model building software 
Description Buccaneer is an automated protein model building program. It features robust handling of limited data resolution, and is competitive in terms of speed. It is particularly useful at resolutions of worse than 2.5A, although it can also be used at high resolution. The latest version includes methods for de-biasing molecular replacement models, arising from grant BB/L006383/1 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact N/A 
URL http://www.ccp4.ac.uk/download/
 
Title Buccaneer version 1.6.3 protein model building software 
Description Buccaneer is an automated protein model building program. It features robust handling of limited data resolution, and is competitive in terms of speed. It is particularly useful at resolutions of worse than 2.5A, although it can also be used at high resolution. The latest version includes methods for handling large structures and for bringing the final model closer to completion. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact N/A 
URL http://www.ccp4.ac.uk/download/
 
Title Clipper-python 
Description Clipper-python is a python interface to the 'clipper' C++ libraries for X-ray crystallographic computation. It enable much more rapid application development and testing by making clipper functionality available through the python programming language. The software has been included in version 7.0 of the CCP4 software suite. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The parallel CCP-EM project for Electron Microscopy has expressed interest in using and distributing the software. 
URL http://www.ccp4.ac.uk/download/
 
Title Clipper-python 
Description Clipper-python is a python interface to the 'clipper' C++ libraries for X-ray crystallographic computation. It enable much more rapid application development and testing by making clipper functionality available through the python programming language. The software has been included in version 7.0 of the CCP4 software suite. The software has now been expanded to allow rapid access to and manipulation of large crystallographic data object by using the tools in python/numpy. The new version has been released by CCP4 and is also now in the CCP-EM source tree for inclusion in their next release. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The library has been developed to facilitate the implementation of the refinement methods being developed on this grant, however its utility has also become apparent at a summer school in which students used it to write their own crystallographic software from scratch, as well as by its adoption by the CCP-EM project. 
URL https://fg.oisin.rc-harwell.ac.uk/projects/clipper-python/
 
Title Clipper-tools 
Description This is a companion module that gets distributed alongside clipper-python. Jon Agirre has developed simple io functions with logfile & XML reporting (different behaviours available using callbacks) for native integration with i2. [released by CCP4, in CCPem source tree] Within clipper_tools: em.cut_density - cuts a part of a cryoEM map using a mask computed from a supplied model, applies sharpening or blurring, computes map coefficients and produces a mini MTZ file and XML/logfile results. This will be employed by future versions of the phaser pipeline for molecular replacement with EM maps. [committed, available in both source trees]. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact N/A 
URL https://fg.oisin.rc-harwell.ac.uk/projects/clipper-python/
 
Title Sheetbend software for model morphing with non-atomic parameterizations. 
Description Software for optimizing a 3D model of a biological molecule to best explain X-ray or electron microscopy observations. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Enquiries from multiple users about application of the software to their problems. 
URL https://pure.york.ac.uk/portal/en/publications/sheetbend-software-for-model-morphing-of-atomic-model...
 
Title Sheetbend software for model morphing with non-atomic parameterizations. 
Description Software for optimizing a 3D model of a biological molecule to best explain X-ray or electron microscopy observations. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This release was an update in response to enquiries from other software developers who wished to experiment with incorporating the software in their own software pipelines. 
URL http://fg.oisin.rc-harwell.ac.uk/projects/clipper-progs/
 
Description CCP-EM Icknield Workshop on Model Building and Refinement for High Resolution EM Maps 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 'Icknield Workshop on Model Building and Refinement for High Resolution EM Maps'
2nd - 4th March 2016

This course is aimed at structural biologists with high resolution EM maps ready for / in the process of modelling building and refinement. This three day course will host some of the leading software developers and provide ample contact time to allow delegates to discuss their data in detail alongside traditional lectures and tutorials.

The principal benefit to the participants was an awareness of tools which can perform de-novo model building in high resolution EM maps, removing the model bias associated with fitting pre-determined structures and facilitating the use of EM when no prior structure is available. The principal benefit to us was contact with real EM data and users, giving us a better awareness of the problems to be solved.
Year(s) Of Engagement Activity 2016
URL http://www.ccpem.ac.uk/training/icknield_2016/icknield_schedule.pdf
 
Description CCP4 Study Weekend 2017: From Data to Structure 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This year's CCP4 study weekend focused on providing an overview of the process and pipelines available, to take crystallographic diffraction data from spot intensities right through to structure. Therefore sessions included; processing diffraction data, phasing through molecular replacement and experimental techniques, automated model building and refinement. As well as updates to CCP4 and where is crystallography going to take us in the future?

400 practitioners from the field attended. I presented a talk on current methods for automated structure solution, as well as our new approach to model free refinement by determination of parameter shift fields.
Year(s) Of Engagement Activity 2017
URL http://www.ebi.ac.uk/pdbe/about/events/ccp4-study-weekend-2017
 
Description DLS-CCP4 Data Collection and Structure Solution Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact DLS-CCP4 Data Collection and Structure Solution Workshop

Delivered three 45 minute lectures on phase improvement, model building and carbohydrates.
Year(s) Of Engagement Activity 2015
URL http://www.ccp4.ac.uk/schools/DLS-2015/
 
Description Invited talk at Barcelona meeting on "MX and cryo-EM" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk by Jon Agirre and Scott Hoh on "Automated model building in cryoEM maps"
Year(s) Of Engagement Activity 2017
URL https://sbu.csic.es/conference-mx-cryoem-bcn/
 
Description Presentation at CCP-EM Spring Symposium 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited talk on Extending model building and refinement tools for Cryo-EM applications at the CCP4 symposium, Nottingham, Apr 2019
Year(s) Of Engagement Activity 2019
URL https://www.youtube.com/watch?v=evbJV6431EA
 
Description Presentation at CCP4 study weekend 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Talk on model morphing with non-atomic representations at the CCP4 study weekend, Nottingham, Jan 2019
Year(s) Of Engagement Activity 2019
URL http://www.cvent.com/events/ccp4-study-weekend-2019/agenda-3372f50a47c74742afc6e001881e38de.aspx?dvc...