Optimal parameter estimation for crystallisation trials

Lead Research Organisation: University of York
Department Name: Chemistry

Abstract

Currently there is no a priori method to determine the optimum crystallization strategy for a particular protein and the process remains highly empirical. Several important variables, which often interact, must be tested in combination. Many different buffers are available to maintain a specific pH and various salts, polymers or organic solvents may be used as precipitants. Detergents and other additives may also be needed to keep a protein soluble during the crystallization process and an exhaustive search of all possible combinations is impossible. Scarcity of protein often makes it crucial that suitable conditions are found in the fewest possible experiments and various sampling techniques have been proposed to reduce the number of trials [1]. The use of specific information about the macromolecule in question and prior experience allows a more systematic approach to crystallization and a number of laboratories have set up in-house databases in order to develop crystallization strategies. In a global program, the Biological Macromolecule Crystallization Database (BMCD) provides information related to proteins and the conditions in which they have been crystallized. This information has been used to build prior distributions, based on the relative frequencies of success for different combinations of conditions and the results show a correlation between families of macromolecules and the conditions under which they crystallize [2]. The aim of this project is to utilize information concerning the variables involved in successful crystallization trials as well as prior knowledge of a particular protein's characteristics through advanced data mining methods. Certain parameters can be estimated prior to crystallization trials. For example, pH-dependent properties such as solubility and stability can be established in the purification protocol. Crystallographers routinely carry out a number of techniques for protein characterization in order to confirm the identity of the protein and ensure it is pure, folded and stable. PAGE gels, IEF gels and dynamic light scattering all provide insight into protein characteristics. Mass spectroscopy, ultra centrifugation and NMR are further techniques that are readily available. The BMCD reports as many as 53 parameters per crystallization entry, although submission of such data is not yet routine for crystallographers, and the database is still patchy. The first stage of this project will be to collate information from databases at AstraZeneca as well as the BMCD and York Structural Biology laboratory. The cooperation of a number of other laboratories will be sought for the rationalization of crystal growth screening. Given all the available information about a protein, the challenge is to determine the relationship between the known factors and the optimal conditions for crystallization. Such a relationship is likely to be a complex, non-linear relationship. Statistical analysis of the data may well indicate the form of the relationship but to extract the maximum information from the data available will require non-linear modelling. This project will investigate the use of neural networks [3] and statistical pattern processing methods to design a systematic procedure for choosing the initial conditions for crystallization trials. The most appropriate neural network architecture will be dependent on the characteristics of the data, and detailed specification of the network will therefore occur after exploratory analysis of the data. [1] Carter, C.W., Jr. and Carter, C.W. (1979) J.Biol. Chem., 254, 12219-12223. [2] Hennessy, D., Buchanan, B., Subramanian, D.,Wilkosz, P. and Rosenberg, J. (2000). Acta. Cryst., D56, 817-827. [3] Bishop, C. Neural networks and pattern recognition. Oxford, Clarendon Press, 1995.

Publications

10 25 50