CCP4 Grant Renewal 2014-2019: Question-driven crystallographic data collection and advanced structure solution

Lead Research Organisation: University of Liverpool
Department Name: Institute of Integrative Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This proposal incorporates five related work packages.

In WP1 we will track synchrotron-collected data through computational structure determination, to find whether the most useful data can be recognised a priori using established or novel metrics of data quality and consistency. We will then enable data collection software to communicate with pipelines and graphics programs to assess when sufficient data have been collected for a given scientific question, and so to prioritise further beamtime usage. We will also communicate extra information about diffraction data to structure determination programs, and so support the statistical models and algorithms being developed in WP4.

WP2 will improve the key MR step of model preparation, especially from diverged, NMR, or ab initio models. One development will be to extend the size limit of ab initio search model generation by exploiting sequence covariance algorithms.

In WP3 we will use our description of electron density maps as a field of control points to better use electron density or atomic models positioned by MR. Restrained manipulation of these points provides a low-order parameterisation of refinement decoupled from atomic models, and therefore suitable for highly diverged atomic models or EM-derived maps. We will extend this approach to characterise local protein mobility without the requirement of TLS for predefinition of rigid groups.

In WP4 we will statistically model non-idealities in experimental data, including non isomorphism, spot overlap, and radiation damage. The resulting models, implemented in REFMAC, will be applied to refinement using data that are annotated by WP1 tools and tracked by WP0.

WP0 will provide the tools to integrate the other WPs. For this, it will create a cloud environment where storage- and compute-resources can be utilised optimally, and where rich information can be passed among beamlines, pipelines, and graphics programs.

Planned Impact

The generic importance of macromolecular crystallography in general and CCP4 in particular is provided in the Pathways to Impacts section.

Molecular Replacement (MR) is an increasingly common route to solving the phase problem for protein crystal structures, its popularity arising from being fast and cheap. In 2012, 77% of protein structures submitted to the PDB were tagged as solved using MR. MR ranges in difficulty from positioning models that accurately model the scattering from the entire asymmetric unit to positioning very small components of the total scattering with high error. Essential to the success of MR is the availability of a search model that represents a portion of the unknown structure accurately enough that, once placed, it provides approximate phasing information allowing for further interpretation of the resulting electron density maps. WP2 aims to improve the efficiency and applicability of MR, enhancements that will reduce the time spent by the crystallographer on structure solution and extend the proportion of targets soluble by the technique. To do this WP2 will improve methods to assemble search models from conventional sources such as homology models and NMR structures. It will further build on recent innovations exploiting a novel source of structural information - low computational-cost, fragment-assembly derived ab initio models, as implemented in the CCP4 program AMPLE, extending the method to membrane proteins. The nascent technique of predicted contact-based ab initio modelling will further be explored, potentially allowing some large novel protein folds to be solved by MR for the first time. The predominance of MR as a structure solution method ensures that the entire crystallographic community will benefit from these improvements and so, in turn, will researchers in the many biological communities for whom protein structure information is valuable.

The software developed in WP2 will be added to the CCP4 suite. The CCP4 suite is used world-wide and is available on Windows, Linux and Mac_OS platforms, providing a direct distribution channel to macromolecular crystallographers. CCP4 has recently introduced and automated update mechanism to enable faster access to new developments. As a result developments in WP2 will be available immediately to the user community.

Although the focus in WP2 is on structural bioinformatics for crystallographic ends, we envisage that some of the methods we will develop to process and refine ab initio models will prove valuable to a broader bioinformatics community. For example, refinement of predicted contact-based models with Rosetta or other fragment-based protocols has not yet been done. Our benchmarking will indicate whether it provides a general method to improve local or global quality of the models: such a protocol would obviously be valuable to a broad modelling community. Similarly, incorporation of predicted contacts from the latest generation of covariance software into fragment assembly ab initio modelling is novel: the benefits of using larger or smaller numbers of predictions will become apparent through our benchmarking and will again be of broad benefit to protein modellers.

Publications

10 25 50
publication icon
Biterova EI (2019) The crystal structure of human microsomal triglyceride transfer protein. in Proceedings of the National Academy of Sciences of the United States of America

publication icon
Guo J (2018) Structure and function of the type III pullulan hydrolase from Thermococcus kodakarensis. in Acta crystallographica. Section D, Structural biology

publication icon
Guo J (2021) The X-ray structure of juvenile hormone diol kinase from the silkworm Bombyx mori. in Acta crystallographica. Section F, Structural biology communications

publication icon
Keegan RM (2015) Exploring the speed and performance of molecular replacement with AMPLE using QUARK ab initio protein models. in Acta crystallographica. Section D, Biological crystallography

publication icon
Potterton L (2018) CCP4i2: the new graphical user interface to the CCP4 program suite. in Acta crystallographica. Section D, Structural biology

 
Description We have looked at ab initio methods for phasing transmembrane helical proteins in AMPLE. This work was published in Acta Cryst D. We have shown that coiled-coil proteins are particularly amenable to structure solution by Molecular Replacement with AMPLE. This work was published in IUCr J.

We have also shown that contact predictions dramatically extend the range of targets that can be solved using ab initio models. This work was published in IUCr J.
We have also explored ways to make ensembles of proteins form single structures and demonstrated their significant performance benefits. This work was published in Acta Cryst D. We are currently finalising work relating to processing of contact-assisted models from databases into search models and preparing a manuscript.

We have developed software ConKit to facilitate the user of contact prediction data. This work was published at Bioinformatics. We also published a review on the applications of contact predictions in structural biology in IUCr J.

We have developed a software pipeline SIMBAD for Molecular Replacement on a large scale. It is specifically useful to detect contaminants, to solve unsequenced proteins etc. This work was published in Acta Cryst D. We have since updated the pipeline to improve performance and are preparing a manuscript. Additional URL https://simbad.readthedocs.io/en/latest/

We have also updated and improved MrBUMP, work published in Acta Cryst D.
Exploitation Route Our findings are continually exploited by users of our software, part of the globally distributed CCP4 suite.
Sectors Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology

URL http://ample.readthedocs.io/en/latest/
 
Description Our findings are reflected in continuous updates to the programs AMPLE and MrBUMP and in the new software ConKit and SIMBAD.
First Year Of Impact 2014
Sector Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology
 
Title AMPLE 
Description A pipeline for unconventional Molecular Replacement using, for example, ab initio protein structure predictions 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact It has allowed solution of protein crystal structures by MR when conventional approaches failed 
URL https://amplemr.wordpress.com/
 
Title AMPLE, 2019 
Description AMPLE is a pipeline for Molecular Replacement. Since its original conception it has been extensively improved to work with search models derived from, for example, NMR ensembles (with or without remodelling), ensembles derived from single structures by computational means, contact-assisted ab initio models, single structures processed according to arbitrary scores provided, ab initio models from databases and so on. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Makes the process of doing molecular replacement in macromolecular structure solution easier for users, consequently enabling new insights into macromolecular molecules. 
URL https://ample.readthedocs.io
 
Title ConKit 
Description A Python package to calculate, convert, analyse and visualise protein contact predictions 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact It has been bundled into the CCP4 and CCP-EM distributions 
URL https://github.com/rigdenlab/conkit
 
Title MrBUMP molecular replacement pipeline for X-ray Crystallography 
Description Automated pipeline to find, prepare and process molecular replacement search models in macromolecular structure solution from X-ray crystallographic data (MX). It helps to address the phase problem in MX. It is distributed with the CCP4 suite of programs and also made available through the CCP4-online web service. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact MrBUMP was made available to the global MX community via the CCP4-online service during the course of this grant. This has given easy access to the service to several thousand potential users. It is also backed up with a cluster system to speed the processing and generate results faster. MrBUMP has also been made to CCP4i2 users through an interface and can now be graphically driven throught the CCP4mg molecular graphics interface. 
URL http://www.ccp4.ac.uk/ccp4online
 
Title MrBUMP molecular replacement pipeline for X-ray Crystallography 
Description Significant updates have been made to the MrBUMP software including the use of the molecular graphical application CCP4mg for a graphical front end to the model search and preparation steps of the program. This enables users to better visualize and manipulate the search models that they are using for their structure solution. New version was released to coincide with the release of a new publication on the software to be part of the CCP4 2017 Study Weekend proceedings in Acta Cryst. D. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Makes the process of doing molecular replacement in macromolecular structure solution easier for users, consequently enabling new insights into macromolecular molecules. 
URL http://www.ccp4.ac.uk
 
Title MrBUMP molecular replacement pipeline for X-ray Crystallography 
Description Significant updates have been made to the MrBUMP software including updates to the molecular graphical application CCP4mg for a graphical front end to the model search and preparation steps of the program. This enables users to better visualize and manipulate the search models that they are using for their structure solution. New version was released to coincide with the release of a new publication on the software to be part of the CCP4 2019 Study Weekend proceedings in Acta Cryst. D. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Makes the process of doing molecular replacement in macromolecular structure solution easier for users, consequently enabling new insights into macromolecular molecules. 
URL http://www.ccp4.ac.uk
 
Title MrBUMP version 2.2.1 
Description Automated molecular replacement software for macromolecular crystallography 
Type Of Technology Software 
Year Produced 2021 
Impact Used to help solve macromolecular structures by users of the CCP4 suite around the world. Included in the post data collection processing at the MX beamlines in Diamond Light Source. 
URL http://www.ccp4.ac.uk
 
Title SIMBAD 
Description Molecular Replacement pipeline based on rapid screening of the MorDa database of non-redundant PDB structures and domains. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact It has already determined that several crystals were unsuspected contaminants and has helped in cases of tray mishandling and crystallisation of unsequenced proteins. 
 
Title SIMBAD 
Description SIMBAD is a sequence-independent Molecular Replacement pipeline. It has modules to detect crystallographic cell similarities to known structures, to screen for contaminants from a database, and to attempt brute-force structure solution using the entire PDB, as represented in a non-redundant, domain-based fashion by the MoRDa database. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact By detecting when the protein crystal does not, in fact, contain the expected protein, especially when a contaminant has crystallised, SIMBAD can save the significant time associated with futile solution attempts. Some synchrotrons and beamlines now run it routinely on all datasets collected. 
URL https://simbad.readthedocs.io
 
Title SIMBAD - Sequence independent molecular replacement based on available database 
Description Software for solving the phase problem in macromolecular x-ray crystallography. Designed to be independent of sequence and use PDB database directly to search for potential matches to a target crystal. Released as part of CCP4 suite in late 2017 including CCP4i2 interface. Will also be available through CCP4 cloud facilties. A publication is due in 2018. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Allows for solving cases of contaminants or crystal structures with no obvious sequence-based homologue available. Several contaminants have been solved using the software and hepled to prevent mis-directed research effort when trying to deal with such cases. 
URL http://simbad.readthedocs.io/en/latest/
 
Description IUCr conference presentation, Montreal 2014 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation and questions answered.

n/a
Year(s) Of Engagement Activity 2014
URL http://www.iucr.org/iucr/cong/iucr-xxiii
 
Description IUCr conference workshop, Montreal 2014 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Education in practical application of AMPLE

Attenders expressed greater confidence in AMPLE use
Year(s) Of Engagement Activity 2014
URL http://www.iucr.org/iucr/cong/iucr-xxiii