CCP4 Grant Renewal 2014-2019: Question-driven crystallographic data collection and advanced structure solution

Lead Research Organisation: MRC Laboratory of Molecular Biology
Department Name: Structural Studies

Abstract

Proteins, DNA and RNA are the active machines of the cells which make up living organisms, and are collectively known as macromolecules. They carry out all of the functions that sustain life, from metabolism through replication to the exchange of information between a cell and its environment. They are coded for by a 'blueprint' in the form of the DNA sequence in the genome, which describes how to make them as linear strings of building blocks. In order to function, however, most macromolecules fold into a precise 3D structure, which in turn depends primarily on the sequence of building blocks from which they are made. Knowledge of the molecule's 3D structure allows us both to understand its function, and to design chemicals to interfere with it.

Due to advances in molecular biology, a number of projects, including the Human Genome Project, have led to the determination of the complete DNA sequences of many organisms, from which we can now read the linear blueprints for many macromolecules. As yet, however, the 3D structure cannot be predicted from knowledge of the sequence alone. One way to "see" macromolecules, and so to determine their 3D structure, involves initially crystallising the molecule under investigation, and subsequently imaging it with suitable radiation.

Macromolecules are too small to see with normal light, and so a different approach is required. With an optical microscope we cannot see objects which are smaller than the wavelength of light, roughly 1 millionth of a metre: Atoms are about 1000 times smaller than this. However X-rays have a wavelength about the same as the size of the atoms. For this reason, in order to resolve the atomic detail of macromolecular structure, we image them with X-rays rather than with visible light. The process of imaging the structures of macromolecules that have been crystallised is known as X-ray crystallography. X-ray crystallography is like using a microscope to magnify objects that are too small to be seen with visible light. Unfortunately X-ray crystallography is complicated because, unlike a microscope, there is no lens system for X-rays and so additional information and complex computation are required to reconstruct the final image. This information may come from known protein structures using the Molecular Replacement (MR) method, or from other sources including Electron Microscopy (EM).

Once the structure is known, it is easier to pinpoint how macromolecules contribute to the living cellular machinery. Pharmaceutical research uses this as the basis for designing drugs to turn the molecules on or off when required. Drugs are designed to interact with the target molecule to either block or promote the chemical processes which they perform within the body. Other applications include protein engineering and carbohydrate engineering.

The aim of this project is to improve the key computational tools needed to extract a 3D structure from X-ray crystallography experiments. It will provide continuing support to a Collaborative Computing Project (CCP4 first established in 1979), which has become one of the leading sources of software for this task. The project will help efficient and effective use to be made of the synchrotrons that make the X-rays that are used in most crystallographic experiments. It will provide more powerful tools to allow users to exploit information from known protein structures when the match to the unknown structure is very poor. It will also automate the use of information from electron microscopy, even when the crystal structure has been distorted by the process of growing the protein crystal. Finally, it will allow structures to be solved, even when poor quality and very small crystals are obtained.

Technical Summary

This proposal incorporates five related work packages.

In WP1 we will track synchrotron-collected data through computational structure determination, to find whether the most useful data can be recognised a priori using established or novel metrics of data quality and consistency. We will then enable data collection software to communicate with pipelines and graphics programs to assess when sufficient data have been collected for a given scientific question, and so to prioritise further beamtime usage. We will also communicate extra information about diffraction data to structure determination programs, and so support the statistical models and algorithms being developed in WP4.

WP2 will improve the key MR step of model preparation, especially from diverged, NMR, or ab initio models. One development will be to extend the size limit of ab initio search model generation by exploiting sequence covariance algorithms.

In WP3 we will use our description of electron density maps as a field of control points to better use electron density or atomic models positioned by MR. Restrained manipulation of these points provides a low-order parameterisation of refinement decoupled from atomic models, and therefore suitable for highly diverged atomic models or EM-derived maps. We will extend this approach to characterise local protein mobility without the requirement of TLS for predefinition of rigid groups.

In WP4 we will statistically model non-idealities in experimental data, including non isomorphism, spot overlap, and radiation damage. The resulting models, implemented in REFMAC, will be applied to refinement using data that are annotated by WP1 tools and tracked by WP0.

WP0 will provide the tools to integrate the other WPs. For this, it will create a cloud environment where storage- and compute-resources can be utilised optimally, and where rich information can be passed among beamlines, pipelines, and graphics programs.

Planned Impact

The general importance of macromolecular crystallography, and CCP4 in particular, is provided in the Pathways to Impacts section.
Techniques for refinement of macromolecular crystal structures are now considered to be well established and are routinely used as part of the structure solution procedure. However, all existing and widely used refinement software use assumptions such as: models comprise a discrete set of atoms, crystals diffract to sufficiently high resolution, and data used for refinement are from a single and immortal crystal. Moreover, each dataset is considered as an independent entity with no relation to any other dataset. In real life applications, crystals containing large macromolecular complexes often diffract to low resolution, they undergo radiation-dependent changes, and datasets are collected from multiple crystals that may or may not be isomorphous.
Crystallography is often applied for drug binding studies, where there is generally at least one complete data set from a related and often isomorphous crystal. In such cases, a small fraction (10-15%) of the dataset for the new crystal might be sufficient to decide whether the ligand is bound to the protein, and to analyse differences between the structures in the two crystals.
Further to increasing reliability of derived atomic models, it is expected that newly developed tools for refinement will extend application to those cases that are currently difficult or impossible to analyse.
The following impacts are expected:
1) Drug binding studies: The first expected important impact will be in drug binding studies, where a complete dataset is available from at least one of the crystals, and data are collected to infer ligand binding and analyse any resulting conformational changes. For this type of study, it may be sufficient to analyse ligand binding using only a small fraction of the full dataset from the new crystal.
2) Increase of resolution: Modelling radiation-dependent changes, and allowing the cooperative use of multiple crystal datasets, will allow an increase in data resolution with high crystal exposure. In some cases, it will in future be possible to analyse structures that cannot presently be analysed due to severe radiation sensitivity.
3) Feedback to the data collection stage, and improved design of experiment: Radiation damage and multiple crystal parameters will be estimated and fed back to data acquisition stage thus improving decision making. For instance, estimated radiation damage rate will be valuable for deciding how many images from a single crystal should be collected before moving to the next one. Moreover fast reclusterisation of crystals will show how much more data needed to be collected to answer the posed biological question.
4) Reaction mechanism studies: One of the far-fetched nevertheless potentially powerful application of WP4's results is application of MX to study reaction mechanisms using fewer crystals. If a reaction occurs in a crystal then it can also be considered as a time dependent event, and can be modelled exactly the same way as radiation dependent changes.

All tools and software developed as a result of this workpackage will be distributed by CCP4.

Publications

10 25 50
publication icon
Kovalevskiy O (2016) Automated refinement of macromolecular structures at low resolution using prior information. in Acta crystallographica. Section D, Structural biology

publication icon
Kovalevskiy O (2018) Overview of refinement procedures within REFMAC5: utilizing data from different sources. in Acta crystallographica. Section D, Structural biology

publication icon
Long F (2017) AceDRG: a stereochemical description generator for ligands. in Acta crystallographica. Section D, Structural biology

publication icon
Long F (2017) Validation and extraction of molecular-geometry information from small-molecule databases. in Acta crystallographica. Section D, Structural biology

publication icon
Nicholls RA (2017) Ligand fitting with CCP4. in Acta crystallographica. Section D, Structural biology

publication icon
Nicholls RA (2018) Current approaches for the fitting and refinement of atomic models into cryo-EM maps using CCP-EM. in Acta crystallographica. Section D, Structural biology

publication icon
Nicholls RA (2017) Low Resolution Refinement of Atomic Models Against Crystallographic Data. in Methods in molecular biology (Clifton, N.J.)

publication icon
Potterton L (2018) CCP4i2: the new graphical user interface to the CCP4 program suite. in Acta crystallographica. Section D, Structural biology

 
Description Software tools LORESTR - low resolution crystal structure refinement using multiple crystal data has been improved further.
SIgHALVEs - variance calculation for cryoEM maps has been improved. Its result is now being used the refinement program refmac.
These and other software tools are being dissiminated to the community via CCPEM and CCP4 software suite. To facilitate easy use GUI components have been added to CCPEM and CCP4.
Exploitation Route 1) The results will be used by structural biology community
2) Further use of tools for macromolecular structure modelling using multiple data sources (cryoEM, X-ray/neutron/electron diffraction)
Sectors Pharmaceuticals and Medical Biotechnology

 
Description CCP4 Grant
Amount £423,880 (GBP)
Funding ID BB/L007010/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2014 
End 07/2019
 
Description Modulated Crystals 
Organisation University of Nebraska-Lincoln
Department Eppley Institute for Research in Cancer and Allied Diseases
Country United States 
Sector Academic/University 
PI Contribution Design of statistical basis for analyses of macromolecular structures using modulated crystals
Collaborator Contribution Experimental design, carrying out experiments and testing of designed software
Impact This collaboration is multidisplinary and involves computational crystallography, small molecular crystallography and structural biology
Start Year 2014
 
Description PDB_REDO 
Organisation Netherlands Cancer Institute (NKI)
Country Netherlands 
Sector Academic/University 
PI Contribution Retuning of our software and help in designing a webserver for large scale calculation for re-interpretation of the Protein Data Bank
Collaborator Contribution Re-analysis and re-interpretation of the macromolecular structures in the Protein Data Bank
Impact 10.1107/S0907444911054515 10.1107/S2052252514009324
Start Year 2011
 
Title ccp-em refmac interface 
Description A pipeline designed for refinement of enormously large cryoEM structures consisting of hundreds of chains. It writes out every chain plus its surroundings, prepares maps covering that regions and run all REFMAC5 refinements in parallel either on cluster or locally on multiple CPUs. The pipeline is included in CCPEM package and is in active development right now. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact This software is just to be released. It is expected that it will make fitting of atomic models to cryo-EM maps easy and will have impact on model quality. 
 
Title lorester 
Description A new software for automatic refinement of low resolution crystal structures have been designed. The software using the sequence of the target protein identifies most similar structures from the PDB and then runs the number of protocols using jelly body as well as ProSmart derived reference structure restraints and gives the best refined model as well as corresponding protocol. The software is included in new ccp4 release. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact According to reports several crystal structures have been analysed using this software although it is too early to judge its impact. r 
 
Title refmac5 
Description refinement of macromolecular structures using X-ray crystallographic dat and cryo-EM reconstruction This software is releaased every year. 
Type Of Technology Software 
Year Produced 2019 
Impact More than 60000 of the PDB has been analysed using this software 
URL http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/
 
Title sigHalves 
Description A tool for cryoEM that takes two half maps and calculates Fourier shell correlation as well as signal and noise levels in resolution shells in reciprocal space. It is supposed to work with REFMAC5 and improves refinement against cryoEM data 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact This software is due to be released. It is too early to assess its impact. 
 
Description APS workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I am one of the organisers, speakers and tutor in this annual workshop. Every talk is followed by stimulating discussions.

After the workshop many PhD students and young scientists solve the problem they are facing as well as gain skills to solve similar problems in future.
Year(s) Of Engagement Activity 2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
URL http://www.ccp4.ac.uk/schools/APS-school/
 
Description BGU/CCP4 workshop on C 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This is one of the international workshop we organise. The aim of these workshops is to teach PhD students and young scientists the best practice in the field of macromolecular crystallography.
Year(s) Of Engagement Activity 2018,2020
 
Description CCP4 School in Chandigarh 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This is one of the international workshop we organise. The aim of these workshops is to teach PhD students and young scientists the best practice in the field of macromolecular crystallography.
Year(s) Of Engagement Activity 2018
 
Description CCP4 workshop in Shanghaj 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact There are two main purposes of these workshops: 1) To teach young scientists the state-of-the-art technqiues in structural biology; 2) to dissiminate software tools developed within muy group to wider audience
Year(s) Of Engagement Activity 2019
URL http://www.ccp4.ac.uk/schools/China-2011/
 
Description CCP4/APS School in Argonne 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The aim of these workshops is to teach PhD students and young scientists the best practices in structural biology.
Year(s) Of Engagement Activity 2018
 
Description Crystallography Workshop: Japan 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I was an organiser, lecturer and tutor in this workshop. This type of workshops are very popular and they stimulate discussions and helps to spread out knowledge and skills gained in my group as well as worldwide.

This workshop helps young PhD students and Postgraduate scientists to gain skills while solving particular problems they are facing.
Year(s) Of Engagement Activity 2006,2008,2010,2011,2012,2013,2014,2015,2017
URL http://www.ccp4.ac.uk/schools/Japan-2014/index.php
 
Description Diamond workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact It is one of the one week workshops series aimed to teach PhD students and young researchers the latest methods in the field of MAcromolecular Crystallpgraphy. There were around 30 participants, 20 of which were students and 10 of them lecturers/tutors.
Year(s) Of Engagement Activity 2015,2016
 
Description MAX4ESSFUN School in Aarhus 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The aim of these workshops is to teach PhD students and young scientists state-of-the-art structural biology tools.
Year(s) Of Engagement Activity 2018,2019
 
Description SEA-COAST workshop on MX 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The aim of these workshops is to teach PhD students and young scientists the best practices in structural biology.
Year(s) Of Engagement Activity 2019,2020
 
Description SPRING8/CCP4 worksop on MX 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The aim of these workshops is to teach PhD students and young scientists the best practices in structural biology.
Year(s) Of Engagement Activity 2018
 
Description SouthAmerican Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact This workshop helps to spread out skills and helps PhD students to gain knowledge from experts in the field. It has impact in all Latin American region. For example as a result of this workshop one of the attendees of the workshop organised a crystallographic group in Peru.

Talks and tutorials in this workshop help to solve problems PhD students are facing.
Year(s) Of Engagement Activity 2013,2014,2015,2016,2017