Mining of protein data bank and feedback to X-ray crystal structure solution and analysis

Lead Research Organisation: University of York
Department Name: Chemistry

Abstract

The aim is to improve the computer software used in determining the 3-D structures of complex biological molecules which make up living organisms. These molecules are coded for by a 'blueprint', in the form of a DNA sequence in the genome, which describes how to make them. Due to enormous advances in molecular biology, a number of projects, including the Human Genome Project, have led to the determination of the complete DNA sequences of a number of organisms. We can therefore read the plans for many of the molecules of life. However without knowledge of their 3-D structure, it is very hard to understand how they actually work. Understanding this is an important step in fixing them when they go wrong, and therefore, in curing a range of illnesses as well as using such molecules in a range of applications in biotechnology. The solution to this problem is through Structural Biology, in which the molecules are manufactured on the basis of their blueprints and 3-D models are constructed for them usually using X-ray crystallography. This allows pictures to be made of biological molecules in which individual atoms can be identified. Crystallography is analogous to the use of a microscope to magnify objects, which are smaller than can seen with the naked eye. However, even the optical microscope has limitations, and we cannot see objects which are smaller than the wavelength of light, roughly 1 millionth of a metre. Atoms are about 1000 times small than this, and we need to use X-rays, which have a wavelength about the same as the size of the atoms, in order to see them. Unfortunately structure solution by X-ray crystallography is complicated because unlike the optical microscope, there is no lens system for X-rays and we need to carry out a number of challenging experiments followed by complex computation and problem solving by experienced scientists. This is especially complex when we look at crystals of proteins, which are actually imperfect and contain about 50% water. The aim of this project is to improve some of the key computational tools of crystallography to make this task easier and more automatic, and to allow the solution of more difficult problems. The basic problem lies in the low observation to parameter ration in protein crystallography (equivalent to a low signal-to-noise ratio in image recognition). Advanced statistical methods are required to make the best use of the limited data available.

Technical Summary

The aim is to advance the application of modern statistical techniques to macromolecular crystallography, and hence extend its application to more challenging structural biological macromolecular crystallographic analysis. There are two main objectives: 1) statistical treatment of order-disorder (OD) structures and 2) design and use of prior structural knowledge. Both objectives will be achieved in three stages: 1) analyse, extract and classify the relevant information from structures in the PDB; 2) use this information to design probability distributions that best describes their behaviour and 3) apply the resulting software to challenging crystal structure analysis. The objectives will be achieved as follows: 1)The PDB will be analysed and entries with 'crystal' imperfections such as (OD) structures with and without twinning will be extracted. These structures will be classified and likelihood functions be designed to deal with such problems. The resulting software will be calibrated using all entries in the PDB. Software will be designed to correct the space group of crystals during refinement and feed the results back into both the data processing and refinement steps. 2)The PDB will be analysed and information about internal degrees of freedom of macromolecules will be extracted. Prior probability distributions of inter-atomic distances of atoms will be designed. To take into account correlated motion of atoms in molecules, differences of atomic displacement parameters will be analysed and appropriate probability distributions will be designed. These will allow refining structures at low resolution. Such restraints will allow smooth transition from rigid group to all-atom models, extracting the optimal information from data while retaining the structural integrity of the derived model. All developed algorithms either will be implemented in the programs REFMAC or MOLREP, or in new software. The software will be made available to the community via CCP4.

Publications

10 25 50
 
Description New statistical tools were developed for analysis of noisy and limited data arising from X-ray crystallography
Exploitation Route Tools have been implemented in software including refmac5, JLigand and Zanuda. They are used widely by structural biology community worldwide
Sectors Pharmaceuticals and Medical Biotechnology

 
Title ProSmart 
Description Conformational independent comparison of macromolecular structures. It can also design information to be transferred from high resolution macromolecular structures to low resolution refinement thus increasing reliability of atomic models derived using limited and noisy data. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Low resolution crystal structure refinement and fitting into cryo-EM maps cannot be imagined without this tool 
URL http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/
 
Title balbes 
Description Automatic crystal structure solution using re-designed macromolecular structures of existing atomic models. 
Type Of Technology Software 
Impact More that 500 hundred crystal structures have been analysed using this software 
URL http://www.ccp4.ac.uk/ccp4online
 
Title refmac5 
Description refinement of macromolecular structures using X-ray crystallographic dat and cryo-EM reconstruction This software is releaased every year. 
Type Of Technology Software 
Year Produced 2019 
Impact More than 60000 of the PDB has been analysed using this software 
URL http://www2.mrc-lmb.cam.ac.uk/groups/murshudov/