Novel force fields devised using machine learning

Lead Research Organisation: University of Manchester
Department Name: Chemistry

Abstract

Proteins are recognised as a very important class of molecules because of their versatile functionality in living systems. This proposal addresses the paramount problem of oligopeptide (and ultimately protein) structure prediction from a theoretical and computational point of view. A recent and authoritative review by Ponder and Case (one of the authors of the popular computer program AMBER) argues that biomolecular modelling will grind to a halt unless the accuracy of current force fields is substantially increased. We believe that the best way forward is by starting afresh rather than by tweaking existing force fields. The design of force fields such as AMBER was based on the computing power available in the 1980s. Since that time computing power has increased by a factor 10,000, which means that a novel force field design philosophy can be adopted, avoiding from the outset the approximations of force fields such as AMBER. In our previous work we introduced multipole moments to replace point charges. Multipole moments reflect better the local electron density of an atom than do point charges, which wrongly assume that this density is spherical. Multipole moments model the electrostatic interaction between one atom and another more accurately, especially at short range. We use the modern theory of Quantum Chemical Topology (aka 'Atoms in Molecules') to partition the electron density of small molecules into atomic fragments. These fragments then 'dress up' a protein/peptide backbone and provide detailed information on its electron density. Quantum topological atoms have a finite volume of variable shape, which can be nicely visualised. These atoms are also widely documented, strongly rooted in quantum mechanics and used for interpretative purposes (e.g. charge transfer, hydrogen bonding). While designing a force field along these lines we modeled the pivotal electrostatic interaction energy upfront, without fitting, as is done in constructing classical force fields. Our approach drastically reduces the number of fitted parameters. Moreover fitted charges are not necessarily transferable from small molecules to larger ones. On the other hand, quantum topological atoms are transferable to a very large extent. Only the remaining energy contributions then need to be fitted to 'ab initio' energies, forces and vibrational frequencies of training molecules. In this proposal we focus on the polarisation of the electron density, that is, the change in the electron density upon a change in the nuclear positions. The novel element is to use advanced machine learning to capture the relation between fluctuating multipole moments and nuclear positions. The input of the Genetic Programming algorithm are the coordinates of the neighbouring atoms of a given central atom and the output is a given fluctuating multipole moment of the central atom. If successful, the proposed methodology is expected to work for other important classes of biochemical compounds as well, such as nucleotides (DNA, RNA), carbohydrates and lipids, which will be tackled in future projects. Given an increase in computer power of at least two orders of magnitude occurring over the next decade we aim at guaranteeing a more secure future of macromolecular modeling. Given the momentum built up in our group we are in an ideal position to consolidate all the components of the design, previously researched and published, into a coherent software package that will be freely available to the UK research community.

Technical Summary

The potential energy functions used in biomoelcular modelling must be improved in order to secure its future predictive power and trust of the experimental community. In our previous work we replaced point charges by atomic multipole moments in order to improve the accuracy of the (short-range) electrostatic interaction. Quantum Chemical Topology (aka 'atoms in Molecules') uses molecular electron density to define atoms as finite volumes shaped by the molecular enviroment they occur in. In this proposal we focus on atomic polarisation, i.e. how the electron density of a topological atom fluctuates upon a change in nuclear positions of its neighbouring atoms. We do not follow an existing method (such as the Drude/shell model, electronegativity equalisation, effective polarisation, polarisable point dipoles, etc.). Instead we adopt a modern machine learning method called Genetic Programming to directly establish the link between an atom's electron density reponsing to a geometrical change in immediate environment. Summarising the proposed method, focus on a given atom in a molecule. Normal modes generate hundreds of molecular geometries, representing the ever changing environment of an atom, including bond length and angle variations, and conformational flexibility. Quantum Chemical Topology cuts this given atom out of each of the distorted molecular charge density and calculates the corresponding atomic multipole moments. The latter fluctuate in response to a change in molecular geometry and it is these data that the Genetic Programming algorithm is trained on. Our proposed method is totally new, which is why we need this feasible study.

Publications

10 25 50