CCP4 Grant Renewal 2014-2019: Question-driven crystallographic data collection and advanced structure solution

Lead Research Organisation: Science and Technology Facilities Council
Department Name: Scientific Computing Department

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This proposal incorporates five related work packages.

In WP1 we will track synchrotron-collected data through computational structure determination, to find whether the most useful data can be recognised a priori using established or novel metrics of data quality and consistency. We will then enable data collection software to communicate with pipelines and graphics programs to assess when sufficient data have been collected for a given scientific question, and so to prioritise further beamtime usage. We will also communicate extra information about diffraction data to structure determination programs, and so support the statistical models and algorithms being developed in WP4.

WP2 will improve the key MR step of model preparation, especially from diverged, NMR, or ab initio models. One development will be to extend the size limit of ab initio search model generation by exploiting sequence covariance algorithms.

In WP3 we will use our description of electron density maps as a field of control points to better use electron density or atomic models positioned by MR. Restrained manipulation of these points provides a low-order parameterisation of refinement decoupled from atomic models, and therefore suitable for highly diverged atomic models or EM-derived maps. We will extend this approach to characterise local protein mobility without the requirement of TLS for predefinition of rigid groups.

In WP4 we will statistically model non-idealities in experimental data, including non isomorphism, spot overlap, and radiation damage. The resulting models, implemented in REFMAC, will be applied to refinement using data that are annotated by WP1 tools and tracked by WP0.

WP0 will provide the tools to integrate the other WPs. For this, it will create a cloud environment where storage- and compute-resources can be utilised optimally, and where rich information can be passed among beamlines, pipelines, and graphics programs.

Planned Impact

The generic importance of macromolecular crystallography in general and CCP4 in particular is provided in the Pathways to Impacts section.

Macromolecular crystallography represents a mature field, where both experimental and computational techniques are expected to be mere instruments for obtaining atomic-scale images of biological macromolecules. A high level of instrumentation is reached at the data-production end, where modern X-ray installations, such as synchrotrons, produce high volumes of data in nearly automatic mode, and often allow researchers to conduct experiments remotely from their home labs. On computational side, high level of automation is achieved by combining many individual steps into sophisticated pipelines, which choose most appropriate structure solution pathway depending on the type of experiment, data quality and structure properties. As a result, over last decade, crystallographic software has grown considerably in size and complexity, which puts increasing maintenance burden on local support teams and researchers. In addition, advanced automation requires significant computation resources, not always available in small to medium size labs. A similar situation is observed in data management domain, where logistics is stressed by high volumes of data produced by both X-ray facilities and in the course of structure solution.

Successful accomplishment of WP0 will have the following impacts:

1) Cloud-based resource for macromolecular computations will allow to seamlessly bridge data production and structure solution, and also provide users with crystallographic software and computational resources as a service. This is expected to simplify data logistics significantly and decrease software, data and hardware maintenance burden on small to medium size research groups.

2) A particular impact will be observed in the simplicity of using automatic structure solution software, where most of computational resources are required. Cloud computing will make automatic structure solution possible with the use of virtually any ultra-portable devices. For most users today, crystallographic computations at home start from merged data sets, processed at synchrotron's beamline. In five year perspective, most users will have the option to start manual work from nearly complete structures, obtained in automatic way in the cloud, without the necessity of moving significant volumes of data between synchrotrons and home setups.

3) An additional impact will be seen in the possibility to archive experimental data and structure solution results, as well as complete track of structure solution process, in the cloud. This will allow for the possibility to re-examine experimental results retrospectively, which is not always possible today.

4) A significant impact will be seen in high homogeneity of computational setup and data accessibility for cloud users with multiple workplaces, for example, home lab, synchrotron, on a visit to a colleague, at a conference or personal residence. No data transfer and exporting/importing CCP4 cloud projects with possible compatibility problems will be ever required for switching between work sites.

5) HTML5-based development is likely to have a very long lifetime and backward compatibility due to the exceptional role of HTML format in the World Wide Web. This will have a substantial positive impact on maintenance burden and resources needed for future developments.

Publications

10 25 50
publication icon
Agirre J (2023) The CCP4 suite: integrative software for macromolecular crystallography. in Acta crystallographica. Section D, Structural biology

publication icon
Kovalevskiy O. (2021) Helping researchers to solve their structures: automation and user guidance in CCP4 Cloud in ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES

publication icon
Krissinel E (2018) Distributed computing for macromolecular crystallography. in Acta crystallographica. Section D, Structural biology

publication icon
Krissinel E (2022) CCP4 Cloud for structure determination and project management in macromolecular crystallography. in Acta crystallographica. Section D, Structural biology

publication icon
Krissinel E (2017) Desktop and Web-based Gesamt Software for Fast and Accurate Structural Queries in the PDB in Journal of Computer Science Applications and Information Technology

 
Description Our central aim is to develop a cloud-computing infrastructure by which CCP4 software and computational resources can be made available to the community. For this we have two strategies. The first one provides a "Desktop" CCP4 installation via virtual machines within DAaS (Data Analysis as Service) framework supported by SCD/STFC. The second, "CCP4 Cloud", provides a client-server model for a user's interaction with crystallographic calculations. CCP4 Cloud is also deployable in DAaS and may fo
Exploitation Route CCP4 Cloud is released to community as a web-service at https://cloud.ccp4.ac.uk (main server, located in Harwell, RAL), and is integrated with CCP4 Software Release 7.1. CCP4 Cloud is available for setting up in interested research organisations, where using a public web-server is not convenient due to data security or other considerations. Up to date, CCP4 Cloud was set up at Crick Institute, London; University of Newcastle, University of Exeter, EMBL Outstation in Hamburg, Germany, and Petra-
Sectors Digital/Communication/Information Technologies (including Software),Education,Pharmaceuticals and Medical Biotechnology

URL https://cloud.ccp4.ac.uk
 
Description CCP4 Cloud is deployed by the Newcastle University, University of Exeter, Francis Crick Institute, EMBL Outstation in Hamburg, Germany, and a pharmaceutical company (Incyte Inc., USA) in-house. They found CCP4 Cloud to be exact for for their needs and approached us. A few more similar requests are in progress.
First Year Of Impact 2021
Sector Digital/Communication/Information Technologies (including Software),Education,Pharmaceuticals and Medical Biotechnology,Other
Impact Types Economic

 
Title CCP4 Cloud 
Description CCP4 Cloud is a web-based system for distributed crystallographic computations. It comprises a multi-server architecture, which includes distinctive front-end, number-crunching and client servers. CCP4 Cloud provides an abstract framework to manage users data and run computational tasks on them. The framework is data-driven, so that a user gradually navigates from experimental data through a number of intermediate processes, generating derived data, to a final solution (such as protein structure) at the end. The framework supports user accounts and projects within them. Each project is developed graphically in any common browser, in a highly integrated and interactive manner. The data workflow is maintained automatically, by thus making it easy to inspect structure solution process retrospectively. Thanks to the multi-server architecture, the system is easily scalable, and allows for crowd-sourcing, where computational resources may be acquired from any donating location. The system includes all main CCP4 automatic structure solvers as well as individual tasks (e.g. structure refinement) and provides both MR and EP routes for structure solution. The work on the increasing functionality and result report capabilities is underway. 
Type Of Material Data analysis technique 
Year Produced 2016 
Provided To Others? Yes  
Impact The system was released in 2018 for a wider circle of testers and installed in few research sites (STFC, University of Oxford, Francis Crick Institute London, LMB/MRC in Cambridge) for expert assessment before official public release. 
URL https://cloud.ccp4.ac.uk
 
Title CCP4 web-service CRANK-2 
Description The service represents a highly automated structure solution pipeline for experimental phasing using maximum likelihood methods. The service takes X-ray reflection file and protein sequence and produces a completed, or partially completed, protein structure. The development is aimed at crystallographers with samples showing anomalous X-ray diffraction, and is available through web-interface and is runnable using a common browser, such as Internet Explorer, Firefox, Safari or Google Chrome. 
Type Of Material Data analysis technique 
Year Produced 2015 
Provided To Others? Yes  
Impact The pipeline automates structure solution with experimental phasing, consolidating multiple operations, previously run in manual mode, into one. A single run of CRANK-2 may be equivalent to 1000s of individual operations, which otherwise a researcher would need to perform manually, with appropriate bookkeeping of data flows. Both software maintainers (CCP4) and users also benefit from the accessibility of the software and computational hardware in central location through the Internet, but thus eliminating the need of local installation and maintenance. 
URL http://www.ccp4.ac.uk/ccp4online
 
Title CCP4 web-service SHELX 
Description The development represents an automated SHELXC/D/E structure solution pipeline for fast routine experimental phasing. Accepts data in XDS, Scalepack, SHELX hkl or mtz formats and outputs phases and a poly-Ala trace. The pipeline us based on SHELX software for experimental phasing, and requires a minimalistic input in form of standard X-ray reflections file. The pipeline produces phased structure factors, which can be used for the subsequent structure solution. If protein sequence information is also supplied, the pipeline will attempt to complete structure solution using Refmac and Buccaneer software from CCP4 software suite. The pipeline is available through web-interface and is runnable using a common browser, such as Internet Explorer, Firefox, Safari or Google Chrome. The development aims at crystallographers solving protein structures with X-ray diffraction on macromolecular crystals. 
Type Of Material Data analysis technique 
Year Produced 2015 
Provided To Others? Yes  
Impact The web-service simplifies structure solution practices by consolidating several tasks and permanent availability through the Internet. All needful software and computational infrastructure is maintained at central location, which decreases the associated costs for both software maintainers (CCP4) and users. 
URL http://www.ccp4.ac.uk/ccp4online
 
Title CCP4-DAaS Development 
Description CCP4 Software has been installed on a system of Cloud Virtual Machines, supported by STFC/SCD department. This setup is a combination of two major developments. Firstly, CVMs were complemented with persistent storage for users to keep their crystallographic projects and data between login sessions, and a convenient mechanism (via a shared folder) was provided for users to upload their experimental data in their Cloud projects. Secondly, CCP4 software, in particular the multi-component pipelines (such as automatic structure solvers) and the new CCP4 GUI-2 were modified to take advantage of parallelisation on SCD's SCARF computational facility. The whole setup allows users with UK Federal IDs to host their crystallographic computations on the Cloud and access them from any suitable location worldwide via a common broadband connection. 
Type Of Material Data analysis technique 
Year Produced 2016 
Provided To Others? Yes  
Impact The system was released to a few research groups and crystallographic facilities managers (The University of York, Diamond Light Source, University of New Castle, Birkbeck College) with the purpose of initial testing and assessment. The testing revealed the need for performance improvements, and the corresponding work is being done. The major impact for research groups is in an easy access to computational resources (SCARF facility) to run CCP4 automatic structure solvers. 
URL https://daas.scd.stfc.ac.uk
 
Description CCP4-DaAS Collaboration on Computational Cloud Developments 
Organisation Rutherford Appleton Laboratory
Department Scientific Computing Department
Country United Kingdom 
Sector Public 
PI Contribution STFC/SCD (Scientific Computing Department) Cloud Team receives CCP4 expertise on crystallographic computations and the Software Suite suitable for setup on SDC Cloud virtual machines. Specific software modifications and additions are being made in order to make CCP4 Software running in a virtual machine setup, exchange data with users' desktop computers and to conduct long-running tasks on a dedicated computational facility (SCARF). In near perspective, data exchange with experimental facility (Diamond Light Source Beamlines) will be developed in order to allow CCP4 and DLS users to have a seamless transfer of data from synchrotron to their Cloud-based Projects for further processing and structure solution.
Collaborator Contribution CCP4 Cloud project acquires an essential expertise in software setup for Cloud-based computations, as well as basic Cloud infrastructure provided and maintained by the Department. In addition, the SCD provides computational facilities (SCARF cluster) for running computationally expensive tasks from CCP4 users. The existing setup is being designed and modified by SCD Cloud Team to meet specifications of CCP4 Software and the corresponding deployment requirements.
Impact 1) Set of technical specifications for computational Cloud infrastructure 2) Pilot project on the deployment of CCP4 Software in the Cloud 3) Computational and Cloud setup suitable for running CCP4 Software 4) Access to computational facilities for prospective CCP4 Cloud users
Start Year 2020
 
Title CCP4go 
Description CCP4go combines a number of automated pipelines in CCP4 Software Suite, aiming at providing the simplest, one-button, solution for end-users. CCP4go choses solution protocol(s) based on data supplied by user, rather than by user's choice. A number of alternative protocols may be chosen subject to data properties, which are also identified automatically. Available protocols include data merging and scaling, automated Molecular Replacement and Experimental Phasing, model building, refinement, preparation of ligand structures and fitting them in electron density. If input data have no complication, CCP4go can deliver a complete solution without user intervention. CCP4go is included in CCP4 Cloud Platform, jsCoFE. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact CCP4go dramatically simplifies operations with CCP4 automated pipelines, and provides end-users with the simplest possible way of solving structures. Without CCP4go, structure solution included several stages, even in most automated form, and a user was expected to make an informed decision regarding the structure solution pathway. CCP4go works completely without user intervention, which substantially improves user experience and makes the most simple and efficient use of CCP4 automated solvers. 
URL http://ccp4serv6.rc-harwell.ac.uk/jscofe/
 
Title Web-based framework for distributed computations 
Description A software framework was developed for performing multiple-stage distributed computations on an expandable hardware basis. The framework consists of 3 types of http servers: front-end, number-cruncher and client-side, communicating exclusively via http or https protocols. The front-end server provides user accounting and, for each user, support for computational projects. Each project represents a branched tree of jobs, generally following the workflow of data in the project. The jobs are executed by the front-end on dedicated number-cruncher servers, which have their own system of job maintenance and storage clean-up. Finally, local clients provide user interface and communication between user and the front end. From user's point of view, the framework looks like a web-application, accessible via common browsers, with rich graphical input and output. Using the in-browser GUI, a user can import data in their projects, form new task and arrange them in form of branching tree, by this allowing for easy reconciliation of the computational routes. Since all projects and imported data are available to user from any geographic location via the Internet, the framework features a type of Cloud setup. From developer's point of view, the framework represents an abstract system of data and task types, complemented with operational links between them. New data and task types can be introduced without changes in the framework, by mere scripting in Javascript and Python. Neither user nor developer are supposed to know the actual configuration of the framework-based computational setup, which makes it extremely versatile and scalable. The framework may be used for hardware crowd-sourcing, thus allowing to utilise idling CPU resources in virtually any location, from a particular lab to computational centres. The framework is being prototyped for specific application in macromolecular crystallography, yet it is fully content-agnostic and may be used in any field where serial computations are required. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Impact Hardware crowd-sourcing; Versatile cloud setups for distributed scientific computations; Uniform access to user's data and computational projects via Internet on virtually any type of client device (PC, tablet, smartphone); Consolidation of software services. 
URL http://ccp4serv6.rc-harwell.ac.uk/jscofe/
 
Description CCP4 Cloud Setup and Presentation at Francis Crick Institute, London 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact CCP4 Cloud was installed locally at Francis Crick Institute, London, to be available as Institute's facility to all internal crystallography groups. A follow-up lecture on CCP4 Cloud was given to local audience.
Year(s) Of Engagement Activity 2019
 
Description CCP4 Cloud for Distributed Crystallographic Computations 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact An invited talk, followed by a hand-on seminar were given at Open SESAME & Instruct-ERIC Workshop "Remote X-ray Data Collection from European Synchrotrons", hosted by Weizmann Institute of Sciences, Rehovot, Israel, May 14-18, 2018. The event was attended by estimated 40 post-graduate students and 20 senior staff engaged in MX experimentation across various European sites. The purpose of the event was to teach and exercise remote data collection at DLS and ESRF facilities, immediately followed by remote data processing and structure solution with CCP4 Cloud. The event showed rather clearly the attractiveness of remote concept for MX computations, as well as high robustness of CCP4 Cloud as practical implementation of the concept.
Year(s) Of Engagement Activity 2018
URL https://www.structuralbiology.eu/news/open-sesame--instruct-eric-workshop-on-remote-x-ray-data-colle...
 
Description CCP4 Cloud workshop at AsCA meeting 2022 in South Korea 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Around 25 researchers attended the workshop to learn how to solve structures with CCP4 Cloud
Year(s) Of Engagement Activity 2022
URL https://asca2022.org/program/?act=sub2_1
 
Description CCP4 Workshop at AsCA Conference, Auckland, New Zealand, December 2-5, 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A lecture and a live demo presentation of CCP4 Cloud were delivered to estimated 30 workshop participants. The demo included solving protein structure in New Zealand using CCP4 Cloud setup in UK, and have shown high fluency and robustness of the system. A number of questions, crystallography and cloud related, were asked by participants, and requests were made for local setup of CCP4 Cloud at Monash University in Australia.
Year(s) Of Engagement Activity 2018
URL http://asca2018.org/workshops/
 
Description CCP4 Workshop, based on CCP4 Cloud, at AsCA Conference, Singapore, 2019 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A set lectures and live demo presentations of CCP4 Cloud were delivered to estimated 55 workshop participants and local organisers during a 2-day long event. The demo included solving protein structures in Singapore using CCP4 Cloud setup at Harwell Campus in the UK, and have shown high fluency and robustness of the system. A number of questions, crystallography and cloud related, were asked by participants.
Year(s) Of Engagement Activity 2019
 
Description CCP4/BGU Crystallography Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A workshop to educate people working with CCP4 Software, including the new Cloud Platform jsCoFE
Year(s) Of Engagement Activity 2018
URL https://lifeserv.bgu.ac.il/wp/ccp4workshop/
 
Description CeBEM/CCP4 Macromolecular Crystallography School "Structural Biology to enhance high impact research in health and disease" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentations, tutorials and practical sessions including the new CCP4 Cloud Platform jsCoFE
Year(s) Of Engagement Activity 2017
URL http://pasteur.uy/en/last-news/mx2017
 
Description European Crystallography Meeting ECM-32 in Vienna 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk on CCP4 Cloud
Year(s) Of Engagement Activity 2019
URL https://ecm2019.org/fileadmin/user_upload/k_ecm2019/Programm/ecm32_core_single_12.08.2019.pdf
 
Description Installation of CCP4 Cloud and presentation in EMBL Outstation Hamburg and PETRA-III Facility, Hamburg 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact CCP4 Cloud was installed locally at EMBL Outstation and PETRA-III Facility, Hamburg, Germany, to be available to all crystallography groups on DESY site. A follow-up lecture on CCP4 Cloud was given to local audience.
Year(s) Of Engagement Activity 2019
 
Description Invited talk "CCP4 Cloud as a system for crystallography teaching" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk at ECM conference session with follow-up discussion and requests for sharing information afterwards
Year(s) Of Engagement Activity 2022
 
Description Madrid Crystallography School 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A lecture on CCP4 Cloud and hand-on tutorials were delivered at Madrid Crystallography School, hosted by the CBE (Department of Crystallography and Structural Biology) of the Institute of Physical-Chemistry "Rocasolano", CSIC (Spanish National Research Council), in Madrid, May 2018. CCP4 Cloud was used to teach various technics of macromolecular structure solution to estimated 30 post-graduate student from various European sites. Using Cloud model was found extremely useful for educational events of this type, since no preliminary software setup and testing is required and, in addition, all participants were able to continue working with their data after the event in Cloud accounts that they received during the workshop. The workshop also showed high robustness of CCP4 Cloud developed and setup at CCP4 headquarters at STFC.
Year(s) Of Engagement Activity 2016,2018
URL http://www.xtal.iqfr.csic.es/MCS2018/
 
Description Open SESAME & Instruct-ERIC Workshop on Remote X-ray Data Collection from European Synchrotrons at the Weizmann Institute of Science 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A lecture on CCP4 Cloud with follow-up practical tutorial and seminar were given to about 30 international participants.
Year(s) Of Engagement Activity 2018
URL http://www.weizmann.ac.il/conferences/RXDC2018/remote-x-ray-data-collec-european-synchrotrons-weizma...
 
Description Presenting CCP4 Cloud at SOLEIL Synchrotron, CNRS - Cea Paris-Saclay, France 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact 25 students from EU and UK attended structure solution workshop based on CCP4 Cloud, with follow-up collaborations on specific projects
Year(s) Of Engagement Activity 2022
URL https://sway.office.com/exFoStFcVeNilCt4
 
Description SBGrid webinar "CCP4 Cloud advanced" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Webinar on advanced features of CCP4 Cloud, staged solution of macromolecular structures, model building and refinement
Year(s) Of Engagement Activity 2022
URL https://www.youtube.com/watch?v=eGlrbLtPlss
 
Description SBGrid webinar "CCP4Cloud in nutshell" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Webinar on principles of macromolecular structure determination in CCP4 Cloud with practical demoing
Year(s) Of Engagement Activity 2022
URL https://www.youtube.com/watch?v=qH0pu3g1Ak4
 
Description Talk at CCP4 Study Weekend "CCP4 Web-services and Cloud Computing Developments" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A talk given at CCP4 Study Weekend 2017 "From Crystal to Structure"
Year(s) Of Engagement Activity 2017
URL http://www.ccp4.ac.uk/ccp4course.php
 
Description introduction of CCP4 Cloud to CCP4 Working Group 2 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact The new CCP4 Cloud Platform jsCoFE was presented to CCP4 Working Group 2 that includes leading PIs in UK's protein crystallography
Year(s) Of Engagement Activity 2018