CCP4 Grant Renewal 2014-2019: Question-driven crystallographic data collection and advanced structure solution

Lead Research Organisation: University of Kent
Department Name: Sch of Biosciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This proposal incorporates five related work packages.

In WP1 we will track synchrotron-collected data through computational structure determination, to find whether the most useful data can be recognised a priori using established or novel metrics of data quality and consistency. We will then enable data collection software to communicate with pipelines and graphics programs to assess when sufficient data have been collected for a given scientific question, and so to prioritise further beamtime usage. We will also communicate extra information about diffraction data to structure determination programs, and so support the statistical models and algorithms being developed in WP4.

WP2 will improve the key MR step of model preparation, especially from diverged, NMR, or ab initio models. One development will be to extend the size limit of ab initio search model generation by exploiting sequence covariance algorithms.

In WP3 we will use our description of electron density maps as a field of control points to better use electron density or atomic models positioned by MR. Restrained manipulation of these points provides a low-order parameterisation of refinement decoupled from atomic models, and therefore suitable for highly diverged atomic models or EM-derived maps. We will extend this approach to characterise local protein mobility without the requirement of TLS for predefinition of rigid groups.

In WP4 we will statistically model non-idealities in experimental data, including non isomorphism, spot overlap, and radiation damage. The resulting models, implemented in REFMAC, will be applied to refinement using data that are annotated by WP1 tools and tracked by WP0.

WP0 will provide the tools to integrate the other WPs. For this, it will create a cloud environment where storage- and compute-resources can be utilised optimally, and where rich information can be passed among beamlines, pipelines, and graphics programs.

Planned Impact

Noble (Newcastle University) and Brown (University of Kent) are chairman and chairman elect of CCP4, and have responsibility to their grant-funding agencies (currently BBSRC and MRC) and commercial license holders for delivery of CCP4s software development, maintenance, distribution, and outreach programs. The impact of their respective contributions, therefore, relates to the specific proposed program of work from the co-applicant centres (summarised in each of the relevant grant applictions), to the output of CCP4 as a whole, and to the array of basic and applied macromolecular crystallography (MX) that depends upon CCP4.

MX is an essential enabling technology for the cellular and molecular biosciences, and consequently for UK pharmaceutical and biotechnological industries. UK research councils and research charities have recognised the need for infrastructural support of the discipline, most recently by assigning the last available Phase III slot at the Diamond Light Source (DLS) to a state-of-the-art facility for micro-focus and in situ crystallography for the academic and commercial MX community. In turn, the biotechnological and pharmaceutical science base that is fostered by such investments contributes hugely to the UK economy: in 2010 the pharmaceutical sector provided 67,000 jobs, each contributing £195,000 of GVA, with 25,000 of these positions being in high skill R&D activities (source: http://www.abpi.org.uk). In particular the majority of industrial access to DLS is for MX - amounting to almost 20% of the total user activity in MX.

Collaborative Computational Project 4 (CCP4) was established by the Research Councils in 1979 to promote the development and dissemination of software and best-practice in MX. To achieve this end, it uses Research Council funding to leverage a circa 3x larger commercial income, which it invests in MX training and in software development, maintenance, and distribution. As such, grant funding of CCP4 has the additional impact of strengthening an important interface between UK academic and commercial scientific endeavours.

CCP4s dissemination activities include the hosting of an annual methods-development meeting, attended typically by 400-500 graduate students, young researchers and PIs. It also co-sponsors with the British Crystallographic Association annual week-long graduate summer schools, held alternately in Scotland and England, at which a cohort of 40+ students are intensively trained in current methods in protein crystallography. These two elements, supported primarily by CCP4s commercial license income, serve to keep the UK at the forefront of methods development in this central technique, and ensure that a pool of well-trained, increasingly interdisciplinary scientists are available to apply the technique in academic and/or commercial settings.

The program of work described here, for which Noble and Brown will have ultimate oversight, looks forward to the next stage of the development of MX, to address some of the outstanding obstacles that limit the success and/or efficient application of the technique. By allowing structures to be determined from ever more challenging targets, including membrane proteins and proteins for which sample preparation is inherently difficult, this work will impact directly upon areas of biomedical and otherwise commercial interest. For example, WP0, WP1 and WP4 address the challenge of maximising output from challenging samples, while WP2 tackles the potential applicaiton of ab initio modelling to generate molecular replacement models. These packages address known bottle necks in the determination of structures for membrane proteins, a structural class for which structural information is badly needed (since > 50% of drugs target membrane proteins), but notoriously hard to obtain

Publications

10 25 50
publication icon
Adams PD (2016) Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop. in Structure (London, England : 1993)

publication icon
Hough MA (2018) From crystal to structure with CCP4. in Acta crystallographica. Section D, Structural biology

publication icon
McCoy AJ (2016) Advances in experimental phasing. in Acta crystallographica. Section D, Structural biology

publication icon
Simkovic F (2017) ConKit: a python interface to contact predictions. in Bioinformatics (Oxford, England)

 
Description Key findings for the active work packages (WP0, WP1, WP2, WP3 and WP4) have been described in the individual returns from Eugene Krissinel (CCP4/STFC/RCaH), Gwyndaf Evans (Diamond Light Source), Daniel Rigden (Liverpool), Kevin Cowtan (York) and Garib Murshudov (LMB, Cambridge).

WP0: Our central aim is to develop a cloud-computing infrastructure by which CCP4 software and computational resources can be made available to the community. For this we have two strategies. The first one provides a "Desktop" CCP4 installation via virtual machines within DAaS (Data Analysis as Service) framework supported by SCD/STFC. The second, "CCP4 Cloud", provides a client-server model for a user's interaction with crystallographic calculations. CCP4 Cloud is also deployable in DAaS and may form the basis for CCP4's future user interfaces. Both frameworks will allow the user to leverage high performance computation facilities, for example those provided by the SCARF cluster at STFC/SCD, and high-volume remote data storage. We have developed a set of Cloud virtual machines with CCP4i2 running on them, complemented by a set of dedicated modules for running CPU-intensive jobs on SCD's SCARF facility. We are, moreover, implementing data exchange protocols between iCAT storage facility for data collected at Diamond, CCP4 Cloud and DAaS virtual machines. In 2017, we made available a preliminary version of our next generation client-server interface, both as a server that can be used by CCP4 subscribers (url http://ccp4serv6.rc-harwell.ac.uk/jscofe/), and as software to support local installations. Through 2018, CCP4 Cloud was complemented with significant number of tasks facilitating structure solution. As a result, CCP4 Cloud provides all main protocols for Molecular Replacement, Experimental phasing and hybrid methods, from image processing to refinement and validation. In addition, interactive model building using Coot Software has been enabled in CCP4 Cloud, and several highly automated pipelines, benefiting from access to considerable computational resources in the Cloud, were added. We also linked CCP4 Cloud with validation and deposition services from wwPDB, such that users can acquire wwPDB Validation Report and prepare deposition data as part of their CCP4 Cloud projects. In summary, CCP4 Cloud has been brought to a release state, and further development line will be mainly in responding to user feedback as it becomes a core element of CCP4's future.

WP1: Our central aim is to allow the results of real-time data analysis to inform experimental decisions on synchrotron beamlines. We have collected and curated test data sets for software development; currently the repository contains ~10TB of data which accounts for ~500 PDB structures. A summer placement student has been recruited for June-September 2017 to link data sets to meta data in a data base. We aim to open the database for broader use by the community before the end of the grant. Towards identifying predictive metrics of dataset quality ~100 data sets have been used to test diffraction data quality indicators (I/sigI, CC1/2, Rpim, completeness) in proof-of-principal statistical analysis; a histogram showing success / failure of correct space-group identification in conjunction with these quality indicators has been created and showed promising indicative properties. Coding has also been initiated to implement decision making in existing MX structure-determination pipelines. In addition, we are collaborating closely with MRC LMB (Murshudov and Evans) on model independent evaluation of electron density map quality in order to drive our learning algorithms by assessing success in a robust manner.

WP2: Our central aim is to enhance the exploitation of existing structural knowledge and improving ab initio structure prediction methods to increase the throughput and quality of macromolecular structures determined by X-ray crystallography. Predicted residue contacts derived from evolutionary covariance have been shown to improve ab initio model quality making larger and more beta-rich proteins tractable to MR structure solution by AMPLE. In related work, we developed and published software ConKit to facilitate the user of contact prediction data. We also published a review on the applications of contact predictions in structural biology. Alternative strategies for sampling from clusters of ab initio models have been extensively compared resulting in a new default processing protocol. AMPLE's performance on transmembrane helical targets has been tested and published showing that, while ideal helices work well for better resolution structures, explicit model building is required to solve poorer resolution cases ab initio. We have shown that coiled-coil proteins are particularly amenable to structure solution by Molecular Replacement with AMPLE. Novel approaches to maximise the value of single distant homologues in AMPLE have been explored and published. Major updates to MrBUMP have also been described in a paper. We have also developed and published a software pipeline SIMBAD for Molecular Replacement on a large scale. It is specifically useful to detect contaminants, to solve unsequenced proteins etc. We have since updated the pipeline to improve performance and are preparing a manuscript. Additional URL https://simbad.readthedocs.io/en/latest/ Finally, new features in Phaser - better statistical treatment of ensemble models, and "gyre" and "gimble" refinement - have improved the performance of AMPLE and Arcimboldo, respectively. All software is available through the CCP4 suite.

WP3: Our central aim is to develop alternative parameterisations for manipulating atomic and electron density MR models. In 2017 our focus has shifted from a "control point" algorithm towards using a spatially complete "shift field" to describe the transformations that relate a model to an experimental electron density distribution. This latter approach shows great promise for working at high speed and with low resolution data, opening up new opportunities in model building and refinement.

Dr Jon Agirre was appointed to work on the York work package of the CCP4 grant on 1st April 2015. In the first 21 months of the grant we have been building a substantial computational infrastructure as well as developing and implementing the mathematical frameworks for model free refinement.

The proof-of-concept control-point software developed by Dr Cowtan was modified for application to real molecular replacement data rather than the original synthetic data. The software has been applied to a problem molecular replacement structure - in which phase information is not available, as well as to test data where phases are available - the latter case being more representative of the application to electron microscopy data. The software is performing effectively when phases are available, for all parts of the structure where there are significant electron density features. The software was instrumented to provide visual diagnostics in the Coot graphics package, which revealed a problem which arises when a convex hull surrounding the structure contains enclosed solvent channels. This will be addressed by pruning uninformative control points.

Tests on a problem molecular replacement structure which shows significant domain motion were unsuccessful. The principal obstacle in this case was the quality of the electron density reconstruction from the model: the domain motion was sufficient to cause the moving regions to display either misleading or no electron density. However, it may be that this dataset represents an unrealistic challenge. We are working on addressing this problem by a three-fold strategy:

- Improvements to the search target function to better detect the appropriate shift to apply to a given region of the search model.
- The control point software will be applied to the problem of generating an ensemble of permutations on the search structure. The information from the model ensemble will then be combined to produce a bias-reduced map against which to perform control point refinement of the coordinates of the search model.
- A database of molecular replacement test structures will be prepared to enable a more effective evaluation of the performance of the algorithm.

In addition, we have developed a python interface to the clipper libraries to enable more rapid development of the required algorithms. We have also been working on the CCP4i2 software framework for implementing and linking the software tools for control point refinement with supporting tools from the CCP4 software suite.

Since mid 2016 we developed a second approach to model-free refinement, with different strengths and limitations, which does not involve control points at all. Instead, a spatially complete field of parameter shifts is determined, which may include isotropic displacement parameters, anisotropic displacement parameters, and positional parameters. The new approach is much simpler, and has been demonstrated on real data for isotropic displacement parameters, and on synthetic data for anisotropic displacement parameters. An implementation for coordinates is in progress. The simplicity of the new approach gives us a strong expectation of releasing a user-oriented software package within the next 12 months.

In 2017 Dr Agirre was awarded a Royal Society University Research Fellowship, and a new PDRA, Dr Stephen Metcalfe, recruited to continue the work. While this transition incurred a significant cost in terms of training, we have made substantial progress since the last report. The shift-field approach has been formalised and reported at the CCP4 study weekend and a paper. The required an initial implementation of the method for the refinement of isotropic thermal parameters, which allowed the method to be validated and performance investigated, as well as providing a sanity check of the theory. Subsequent to publication we have been working in parallel on the refinement of atomic coordinates, and on the refinement of anisotropic thermal parameters. Anisotropic thermal parameter refinement appears to be possible, although we have not yet determined the limitations of the method. Coordinate refinement has been demonstrated at data resolutions much poorer than are required for traditional refinement methods, and with a radius of convergence which is comparable to or occasionally exceeds the best existing methods. In addition, the new method can be 1-2 orders of magnitude faster than traditional methods (due to working at lower resolutions). This opens up the possibility of new structure solution methods which the computational cost of refinement previously rendered impractical.

Since Feb 2018 we have implemented the new refinement method in a piece of software, 'sheetbend', which has been published in a paper and released to users through the CCP4 source repository; it will also become available as part of the CCP4 software suite at the next release. The current release version performs coordinate refinement and isotropic B-factor refinement. Stephan Metcalfe has also implemented anisotropic B-factor refinement and is testing this for release over the next month or two.


WP4: Our central aim is to develop atomic model refinement against diffraction data produced using multiple crystals. We have developed a method and corresponding software - LORESTER for reusing information in the PDB about macromolecules under study, which is now available through the range of CCP4 user interfaces. Infrastructure for full multiple crystal refinement is now in place and we have started development of full multi-crystal refinement tools. The developments include revised likelihood function for optimal information transfer from the data to the atomic model. Developed methods are also being applied for atomic modelling of cryo-EM maps when there are multiple classes of maps and corresponding multiple sets of atomic models.
Exploitation Route Open access software that can be developed by other contributors. We have already mentioned code reutilisation by CCP-EM, but the code is also used and contributed to by many additional contributors to the software developments funded who have no direct funding from this grant eg SHELX, PHASER

We also are involved in supporting World Wide Protein Data Bank initiatives in quality control and validation of X-ray structures for deposition in the PDB contributing code directly and collaborating with other software developers (e.g. PHENIX) and the wider crystallographic community to help set policy and implement required changes. eg move to mmcif deposition as standard in summer 2019
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description CCP4 is an ongoing BBSRC supported activity that develops, maintains and distributes software for Macromolecular crystallography. It is Globally utilised by academics and industrial groups to provide three dimensional structural information on biological structures to aid understanding of biological processes and revealing molecular interactions at an atomic level which is utilised in drug design. The availability of such software has been essential to a number of Biotech/SME post 2000 startups in the UK e.g. Heptares, and Astex as well as small CRO's such as CANGENIX (since acquired by Galapagos and now Charles River and Peak Proteins (recently spun out of the AZ site at Alderley Park). CCP4 tools are critical to automated pipelines used at DIAMOND Light Source for automatic data processing and structural solution. DIAMOND Light source has its own Industrial Income part of which is derived from contracted Macromolecular crystallography work supported by CCP4 code.The CCP4 model itself has been used an example of how to leverage BBSRC income with commercial licensing to develop open source software in the life science sector. CCP4 code has underpinned developments in CRYO-EM software with a number of researchers involved spanning both CCP's.
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Education,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic

 
Description CCP4 Advanced integrated approaches to macromolecular structure determination
Amount £60,000 (GBP)
Funding ID BB/S006974/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 03/2019 
End 03/2024
 
Title CCP4 I2 interface 
Description This is a new software interface to the CCP4 package that allows a non expert user to solve crystal structures. It has bespoke workflows to guide the user through each step and suggests following steps to be performed. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Significant user uptake in the first 9 months. A number of dot releases have been rolled out on all commonly used platforms. Linux. macOS and Windows. 
URL http://www.ccp4.ac.uk
 
Title Production of CCP4 7.0 
Description CCP4 Crystallographic Computing Suite Version 7.0 This includes a new GUI and backend file handling system 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Hundreds of new protein Structures and parsing of metadata for seamless deposition in PDB archive 
URL http://www.ccp4.ac.uk
 
Title Updated version of CCP4 distribuition Version 6.4 
Description Crystallographic Software Suite for Macromolecular Structure determination 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Novel Structures 
URL http://www.ccp4.ac.uk
 
Description CCP4 Study Weekend 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Software development and crystallographic approaches to solving structures
Year(s) Of Engagement Activity 2015,2016
 
Description CCP4 Study Weekend 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact An annual study workshop composed of lectures and interactive demonstrations held in Nottingham over 3 days
Year(s) Of Engagement Activity 2017
URL http://ccp.ac.uk/
 
Description Participation in 2018 Study weekend 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact CCP4 Study weekend - A conference and workshop for postgraduate researchers to discuss current methods and showcase new CCP4 related products
Year(s) Of Engagement Activity 2018
 
Description wwPDB workshop on Validation (with a focus on ligands) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A meeting of developers, curators and users (particular focus on industrial users) to discuss changes in data formats and QC measures for ligands
Year(s) Of Engagement Activity 2016