Large-scale first principles calculations applied to biomolecular simulations

Lead Research Organisation: University of Southampton
Department Name: Sch of Chemistry

Abstract

The search for new medicinal entities within the pharmaceutical industry is a critical problem. Early stage lead identification and optimization often exploits knowledge of the interaction between a putative drug, and a target enzyme or other macromolecular system. Consequently in recent years there has been exponential growth in the number of protein structures (e.g. 53 thousand structures in the RCSB Protein Data Bank) and similar growth has been observed with proprietary structure data in the pharmaceutical industry. In parallel to the growth of available structure data, significant research effort has been directed towards understanding the possible interactions of small molecules and the protein structures. Most commonly this has been through empirical docking schemes whose main focus has been to provide fast identification of an approximate binding pose. Speed is important because of the enormous size of the compound collections that need to be virtually 'screened'. These schemes provide a very reasonable estimation of the binding pose; however they almost universally fail when used to estimate differences in binding affinity. This failure is due to the approximations made in their construction. Specifically, even when physical interactions between protein and ligands are considered in an atomistic way, they are described by classical potentials, so molecules are represented as balls (atoms) connected with springs (bonds). In general this precludes the inclusion of any form of polarization of ligand or protein, or electronic charge transfer between the moieties. To get an accurate representation of such interactions one needs to use a full Quantum Mechanical (QM) representation of the system. Application of QM methods from 'first principles' to systems as large as protein-ligand complexes has until recently been beyond the scope of any available methods, due to prohibitive computational cost which scales asymptotically as ~N^3 or worse, where N is the number of atoms. Even on supercomputers, these methods are limited to no more than a few hundred atoms while most proteins of interest contain thousands of atoms. Recently the ONETEP first principles method has been developed where the computational cost scales only as ~N (linear-scaling) and is capable of calculations with thousands of atoms. The ONETEP method was originally developed and validated in the context of Materials Science simulations. The main focus of the proposed research would be application of ONETEP to a number of biological systems. The work will answer several questions such as: the ability of the method to correctly predict the relative binding affinity of series of potential drugs and the analysis of binding mechanism through visual and numerical processing of the electronic structure (density, molecular orbitals) of the complex in a full QM framework. Reliable modelling of mechanisms such as charge transfer are key to correct understanding of interactions for example in the heme group in CYP p450 enzymes - which play a central role in human metabolism. Calculations on the CYP p450s, and any other metal-containing systems, cannot be accurately performed without some consideration of the full QM effects. Previous efforts included QM/MM approaches, where a small portion of the biomolecule is treated in a QM fashion and the rest is described with classical potentials. A major concern of these approaches is the quality of the coupling between the classical and quantum regions as well as the size of the quantum region (usually too small, for computational reasons). ONETEP will therefore also provide absolute benchmarks for QM/MM approaches by treating quantum mechanically the entire biomolecule. The current proposal would provide vital validation of the linear-scaling first principles QM methodology in biomolecular simulations, which is critical before any widespread adoption of the methods within the pharmaceutical industry.

Publications

10 25 50