CCP4 Advanced integrated approaches to macromolecular structure determination

Lead Research Organisation: Diamond Light Source
Department Name: Science Division


Proteins, DNA and RNA are the active machines of the cells which make up living organisms, and are collectively known as macromolecules. They carry out all of the functions that sustain life, from metabolism through replication to the exchange of information between a cell and its environment. They are coded for by a 'blueprint' in the form of the DNA sequence in the genome, which describes how to make them as linear strings of building blocks. In order to function, however, most macromolecules fold into a precise 3D structure, which in turn depends primarily on the sequence of building blocks from which they are made. Knowledge of the molecule's 3D structure allows us both to understand its function, and to design chemicals to interfere with it.
Due to advances in molecular biology, a number of projects, including the Human Genome Project, have led to the determination of the complete DNA sequences of many organisms, from which we can now read the linear blueprints for many macromolecules. As yet, however, the 3D structure cannot be predicted from knowledge of the sequence alone. One way to "see" macromolecules, and so to determine their 3D structure, involves initially crystallising the molecule under investigation, and subsequently imaging it with suitable radiation.
Macromolecules are too small to see with normal light, and so a different approach is required. With an optical microscope we cannot see objects which are smaller than the wavelength of light, roughly 1 millionth of a metre: Atoms are about 1000 times smaller than this. However X-rays have a wavelength about the same as the size of the atoms. For this reason, in order to resolve the atomic detail of macromolecular structure, we image them with X-rays rather than with visible light.
The process of imaging the structures of macromolecules that have been crystallised is known as X-ray crystallography. X- ray crystallography is like using a microscope to magnify objects that are too small to be seen with visible light. Unfortunately X-ray crystallography is complicated because, unlike a microscope, there is no lens system for X-rays and so additional information and complex computation are required to reconstruct the final image. This information may come from known protein structures using the Molecular Replacement (MR) method, or from other sources including Electron Microscopy (EM).
Once the structure is known, it is easier to pinpoint how macromolecules contribute to the living cellular machinery. Pharmaceutical research uses this as the basis for designing drugs to turn the molecules on or off when required. Drugs are designed to interact with the target molecule to either block or promote the chemical processes which they perform within the body. Other applications include protein engineering and carbohydrate engineering.
The aim of this project is to improve the key computational tools needed to extract a 3D structure from X-ray and electron diffraction experiments. It will provide continuing support to a Collaborative Computing Project (CCP4 first established in 1979), which has become one of the leading sources of software for this task. The project will help efficient and effective use to be made of the synchrotrons that make the X-rays that are used in most crystallographic experiments but also extend to use of electron microscopes which have gained much recent publicity with the Nobel prize being awarded to researchers from this field. It will provide more powerful tools to allow users to exploit information from known protein structures when the match to the unknown structure is very poor. Finally, it will allow structures to be solved, even when poor quality and very small crystals are obtained.

Technical Summary

This proposal incorporates four related work packages.
In WP1 we will expand on our work using established and novel metrics of data quality and consistency to quantify the relationship between diffraction and map quality. The tools will be used to optimise approaches to structure determination from multiple or serial crystallography data to enable optimal selection of collected data and fully utilise all the information in structural refinement. WP1 will also develop and implement methods for electron diffraction data collection, integration and refinement.
WP2 will utilise generalise the use shift field refinement and extend its usage to hybrid refinement approaches and develop new software libraries to enhance and speed up protein structure model building and refinement across a wide resolution range.
In WP3 we will develop and implement the use of contact prediction methods for use in crystallography. It will help identify protein domain boundaries, define new search model approaches. The contact prediction approach will also be used to validate Molecular replacement solutions and assist in the interpretation of crystallographically derived protein:protein contacts.
In WP4 we will develop a model for electron scatter from macromolecular samples to enable software development and experimental design. These models will be used to develop and implement new scaling algorithms for electron diffraction data within DIALS.

Planned Impact

The impact of macromolecular crystallography and CCP4 to fundamental biomedical research, as well as to the pharmaceutical industry, is provided in the Pathways to Impact section.
The popularity of macromolecular crystallography and cryoEM has resulted in these techniques being increasingly applied to the study of progressively more challenging macromolecular structures. These typically exhibit intrinsically position-dependent mobility, resulting in limited data and varying signal-to-noise ratio in different parts of the maps. Moreover, in crystallography, data are collected using multiple crystals that are merged together; serial crystallography has become a standard tool for structural biologists. Popularity of the use CC1/2 to select parts of the data that are suitable for structure elucidation means that the signal-to-noise ratio can be very low, diminishing to 1 or even less. Also, the criteria currently used to assess quality - such as resolution and R-factors - are becoming increasingly confusing for practical structural biologists and journal referees alike. It is timely to re-evaluate quality indicators, ensuring that all data collected during the experiment are optimally utilised. A new Fourier optics based quality indicator will address this problem and it will give an objective indication of the resolvability of peaks in the calculated maps. This measure will depend on directional and time dependent data quality, data completeness, and the current state of the statistical model (including the atomic model). Such an indicator will also address the problem of local resolution, and will be used for position and direction dependent map de-blurring thus making maps more interpretable.
Developed techniques will be implemented in the new data-scaling program developed by the DIALS group. This tool will calculate the limit of useful data as well as the maximum expected resolvability, providing structural biologists with a way to decide whether the experiment should be continued (i.e. more data are required). These techniques will also be implemented in the refinement program REFMAC5, allowing the difference between current and maximum resolvability to be analysed and utilised for decision making by practical crystallographers and automatic pipelines. Resolvability for each data set, with and without the refined model, will be calculated for the METRIX data. This will be included in the feature vector that is used by machine learning algorithms for map quality assessment.
This work package will also address the growing popularity of microED - electron diffraction by micro macromolecular crystals. One of the elements of WP1 will focus on the joint refinement of electron and X-ray diffraction data using the joint conditional probability distribution of two related data sets, with corresponding atomic models reflecting electrostatic potential and electron density, respectively. Under the first Born approximation, electrons are diffracted by the electrostatic potential and X-rays are scattered by the electron charge cloud. These are related by the Poisson equation. This fact will be used for joint refinement, as parameterised in Fourier space by the Mott-Bethe formula, allowing reduction of the effective number of parameters. Using this formula means that one set of atomic scattering factors can be used both for electron and X-ray diffraction. We will explore the possibility of point charge refinement when high quality electron diffraction data are available, possibly together with X-ray diffraction data. To perform such refinement we will need to account for effects such as absorption and radiation damage; such effects can change the charge distribution dramatically. Consequently, unmerged data must be used for such refinement. This part of WP1 will be carried out in collaboration with WP4.
All developed software will be distributed by CCP4, making them accessible to the structural biologist community worldwide.


10 25 50