Mining of protein data bank and feedback to X-ray crystal structure solution and analysis
Lead Research Organisation:
University of York
Department Name: Chemistry
Abstract
The aim is to improve the computer software used in determining the 3-D structures of complex biological molecules which make up living organisms. These molecules are coded for by a 'blueprint', in the form of a DNA sequence in the genome, which describes how to make them. Due to enormous advances in molecular biology, a number of projects, including the Human Genome Project, have led to the determination of the complete DNA sequences of a number of organisms. We can therefore read the plans for many of the molecules of life. However without knowledge of their 3-D structure, it is very hard to understand how they actually work. Understanding this is an important step in fixing them when they go wrong, and therefore, in curing a range of illnesses as well as using such molecules in a range of applications in biotechnology. The solution to this problem is through Structural Biology, in which the molecules are manufactured on the basis of their blueprints and 3-D models are constructed for them usually using X-ray crystallography. This allows pictures to be made of biological molecules in which individual atoms can be identified. Crystallography is analogous to the use of a microscope to magnify objects, which are smaller than can seen with the naked eye. However, even the optical microscope has limitations, and we cannot see objects which are smaller than the wavelength of light, roughly 1 millionth of a metre. Atoms are about 1000 times small than this, and we need to use X-rays, which have a wavelength about the same as the size of the atoms, in order to see them. Unfortunately structure solution by X-ray crystallography is complicated because unlike the optical microscope, there is no lens system for X-rays and we need to carry out a number of challenging experiments followed by complex computation and problem solving by experienced scientists. This is especially complex when we look at crystals of proteins, which are actually imperfect and contain about 50% water. The aim of this project is to improve some of the key computational tools of crystallography to make this task easier and more automatic, and to allow the solution of more difficult problems. The basic problem lies in the low observation to parameter ration in protein crystallography (equivalent to a low signal-to-noise ratio in image recognition). Advanced statistical methods are required to make the best use of the limited data available.
Technical Summary
The aim is to advance the application of modern statistical techniques to macromolecular crystallography, and hence extend its application to more challenging structural biological macromolecular crystallographic analysis. There are two main objectives: 1) statistical treatment of order-disorder (OD) structures and 2) design and use of prior structural knowledge. Both objectives will be achieved in three stages: 1) analyse, extract and classify the relevant information from structures in the PDB; 2) use this information to design probability distributions that best describes their behaviour and 3) apply the resulting software to challenging crystal structure analysis. The objectives will be achieved as follows: 1)The PDB will be analysed and entries with 'crystal' imperfections such as (OD) structures with and without twinning will be extracted. These structures will be classified and likelihood functions be designed to deal with such problems. The resulting software will be calibrated using all entries in the PDB. Software will be designed to correct the space group of crystals during refinement and feed the results back into both the data processing and refinement steps. 2)The PDB will be analysed and information about internal degrees of freedom of macromolecules will be extracted. Prior probability distributions of inter-atomic distances of atoms will be designed. To take into account correlated motion of atoms in molecules, differences of atomic displacement parameters will be analysed and appropriate probability distributions will be designed. These will allow refining structures at low resolution. Such restraints will allow smooth transition from rigid group to all-atom models, extracting the optimal information from data while retaining the structural integrity of the derived model. All developed algorithms either will be implemented in the programs REFMAC or MOLREP, or in new software. The software will be made available to the community via CCP4.
Organisations
Publications
Lebedev AA
(2012)
JLigand: a graphical tool for the CCP4 template-restraint library.
in Acta crystallographica. Section D, Biological crystallography
Levdikov VM
(2009)
Structural rearrangement accompanying ligand binding in the GAF domain of CodY from Bacillus subtilis.
in Journal of molecular biology
Murshudov GN
(2011)
REFMAC5 for the refinement of macromolecular crystal structures.
in Acta crystallographica. Section D, Biological crystallography
Ng CL
(2009)
Conformational flexibility and molecular interactions of an archaeal homologue of the Shwachman-Bodian-Diamond syndrome protein.
in BMC structural biology
Phan G
(2011)
Crystal structure of the FimD usher bound to its cognate FimC-FimH substrate.
in Nature
Watson AA
(2011)
Structural flexibility of the macrophage dengue virus receptor CLEC5A: implications for ligand binding and signaling.
in The Journal of biological chemistry
Description | New statistical tools were developed for analysis of noisy and limited data arising from X-ray crystallography |
Exploitation Route | Tools have been implemented in software including refmac5, JLigand and Zanuda. They are used widely by structural biology community worldwide |
Sectors | Pharmaceuticals and Medical Biotechnology |
Title | ProSmart |
Description | Conformational independent comparison of macromolecular structures. It can also design information to be transferred from high resolution macromolecular structures to low resolution refinement thus increasing reliability of atomic models derived using limited and noisy data. |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | Low resolution crystal structure refinement and fitting into cryo-EM maps cannot be imagined without this tool |
URL | http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/ |
Title | balbes |
Description | Automatic crystal structure solution using re-designed macromolecular structures of existing atomic models. |
Type Of Technology | Software |
Impact | More that 500 hundred crystal structures have been analysed using this software |
URL | http://www.ccp4.ac.uk/ccp4online |
Title | refmac5 |
Description | refinement of macromolecular structures using X-ray crystallographic dat and cryo-EM reconstruction This software is releaased every year. |
Type Of Technology | Software |
Year Produced | 2019 |
Impact | More than 60000 of the PDB has been analysed using this software |
URL | http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/ |