Reliable computational prediction of molecular assembly

Lead Research Organisation: University of Manchester
Department Name: Chemistry

Abstract

An important modern frontier of research in the physical sciences is the proper understanding and control over molecular assembly. The Grand Challenge of "Directed Assembly of Extended Structures with Targeted Properties" tackles this frontier from various angles. I focus on the angle of chemical computing, which has established itself as an independent source of information, complementary to experiment. There is a need, of ever increasing urgency, for accurate and hence more reliable prediction of interaction energies between molecules. The structure and dynamics of molecular assemblies sensitively depend on most subtle energy changes. This is why the scientific challenge of accurate energy prediction is still as acute as ever. If energy is correctly predicted then everything else follows: realistic structures, dynamics and properties. I propose a novel approach, drastically different to the current paradigm.

Matter at ambient conditions is governed by a master equation called the Schrödinger equation, which returns the interaction energy of a molecular assembly. Solving this master equation accurately for sizeable molecular aggregates is very expensive or even impossible with current computer power. Force fields, however, can provide this interaction energy, and do so many orders of magnitude faster. A force field is a formula that delivers the energy of a molecular system as a direct function of the system's atomic coordinates. This formula contains many parameters, specific to the system at hand. The challenge is to design a force field that is reliable. The best and only long term strategy is to map the force field as faithfully as possible onto the solution of the Schrödinger equation, both in terms of energy and the wave function. With exascale computers around the corner and GPU technology recently overtaking CPUs, it is pivotal and timely to invest into more realistic force fields.

Here I aim to offer biomolecular modelling a completely new route to designing a force field, with much more truthful electrostatics. We propose the completed construction of the first ever (high-rank) multipolar force field for flexible molecules, with both intra- and intermolecular polarisation. This is crucial for molecular assembly and recognition, as well as the realistic modelling of hydrogen bonding. Molecular systems, in the presence of the strong and inhomogeneous electric fields caused by ions, will also be modelled realistically for the first time.

The true predictive power of a force field depends on the reliability of the information transfer of small molecules (or molecular clusters) to large molecules. Only if this transferability is high, a force field will make reliable predictions. The main idea behind our force field, called QCTFF, is to construct "knowledgeable" atoms. These atoms are drawn from small molecules and made to interact in order to predict properties of large molecules. They are 3D fragments of electron density, with a finite volume. These atoms have sharp boundaries, which endows them with a "malleable" character. Their precise shape responds to the immediate environment of the molecule they are part of. A machine learning method then captures how these atoms change their multipole moments in response to the positions of their neighbours. We have successfully reached the proof-of-concept stage of this novel idea and now I intend to fully exploit it.

Although QCTFF is generic, its application is biased towards proteins, ions and water. Only a fellowship can deliver the ambitious but feasible goal of creating this transformative enabling technology towards life science applications. This technology will also serve as a robust platform from which to develop an innovative novel force field, to study reactions in solution and in enzymes, as well as crystal nucleation. The radical and innovating decisions taken at the outset of QCTFF's design are the best guarantee for its long lasting success.

Planned Impact

(A) Who will benefit from this research?
(A1) Exascale computing, the new frontier.
Exascale computing refers to computing capabilities beyond the currently existing petascale, representing a thousandfold increase over that scale. Last autumn, substantial government funding has been directly allocated to Daresbury Laboratories for investment into their own exascale computing facilities. We plan to have a version of QCTFF ready for testing on this facility before the middle of the Fellowship's duration. The higher computational cost of QCTFF will be offset by its enhanced realism and accuracy. The alternative of running current force fields on the exascale infrastructure would just entail that "unreliable results are obtained faster".

(A2) EPSRC itself and Policy-makers
On p.16 of the last International Review of UK Chemistry Research (2009), reviewers complain that "In the area of theory and computation, no provision seems to have been made to encourage the development of new algorithms and new ideas to bring to full fruition the utilisation of the next generation ... machines". The current application will raise the international standing of theoretical and computational chemistry in Britain. This will benefit EPSRC at their next international review.

(A3) Companies using molecular modelling.
Many companies recognise the increasing role of computation in their future because cheaper hardware is guaranteed to emerge during the coming decades. As users of more reliable software, companies will make better predictions, which enhances the trustworthiness of molecular modelling in experimentalist circles.

(A4) UK plc and employment.
In time of economic contraction it is important to invest into novel technologies. The invenion of a more reliable atomic modelling tool provides the basis for long term success and hence sustainable wealth creation. According to a market assessment we commissioned in connection with a different grant, molecular modelling methods are quoted as having become significant tools in nanoscale R&D across a variety of industries, including commodity and specialty chemicals, fuels, polymers, structured materials, electronics and photonics materials, industrial gases, personal care and food products, computer software and hardware and agricultural chemicals.

(A5) Academic researchers.
Details see Je-S box "Academic Beneficiaries".

(B) How will they benefit from this research?
The most important aspect is that we aim for a future-proof force field, that is, one that "works for the right reasons". If scientific work is carefully done then its product puts one in a strong position to have others benefit from it. The development of accurate intermolecular potentials is a strong research theme in Britain with seminal contributions from distinguished researchers. The requested fellowship will aim at strengthening this activity.

The impact of simulation on society in general is increasing at accelerated pace via the enhanced industrial application of molecular software. The current minister for Universities and Science is fully aware of this. He has announced a £145 million investment for innovation in e-infrastructure (details see National Importance section in Case for Support). The continuing reduction of hardware cost enables and therefore warrants the use of the more sophisticated models that this proposal provides. Industry widely recognises that the role of computation will increase in the future since cheaper hardware is guaranteed to emerge. However, computers are only as useful as the quality of the algorithms that run on them. The longer term impact of better software cannot be overestimated since it will enhance confidence in simulation.

Publications

10 25 50
 
Description The only way that current computers can predict the structure and dynamics of proteins in water is via force fields rather than a first principle solution of the master equation of quantum mechanics. Force fields are actually shortcuts: they can calculate the energy of a large system directly, without explicitly referring quantum mechanics. This is how force fields can calculate, in a reasonable time, the energy of a system of ten of thousands of atoms a millions of times. However, the problem with current popular force fields is that their architecture is stuck in the past and that it is very hard to improve systematically and permanently. This is why it is best to embrace modern computing power and design a more realistic architecture from scratch, this time in compliance with sound physical principles. This grant has shown that such an overhaul is possible and successful. The resulting novel force field called FFLUX demonstrates proof-of-concept. As a result, the powerful computers of the future will produce more reliable predictions.
Exploitation Route There are at least the following ways:
(1) Publications of a tutorial or review-like nature. Two examples have already been published:
(a) PLA Popelier, Physica Scripta 91 (3), 033007 (2016) (41 citations) and
(b) PLA Popelier, International Journal of Quantum Chemistry 115 (16), 1005-1011 (69 citations).
(c) PLA Popelier, Advanced Theory and Simulations, by invitation, Perspective to be submitted in 2021.
(2) Work with Daresbury Lab (IBM and Hartree, named contacts identified in both consortia) to consolidate a FFLUX-containing version of DL_POLY (i.e. DL_FFLUX).
(3) Within the MIB there are various groups that will soon benefit from DL_FFLUX, as CoIs for imminent grant applications: (i) Prof Saiani (hydrogels), (ii) Dr Green (synthetic biology), (iii) Prof Waltho (fundamental enzymology) and (iv) Prof Scrutton/Dr Johannissen (biocatalysis).
(4) Engage with potential takers outside of the University of Manchester such as Prof Essex in Southampton and Prof Mulholland at the Univ. of Bristol and several others abroad (as identified from seminars delivered in our School and subsequent conversations in my office).
(5) Publish applications of FFLUX ourselves: the vast majority of researchers is interested in applications and if they are suitable chosen to "make a splash" then they will be more keen to take it up.
(6) Prepare a version of DL_FFLUX for the new supercomputer ARCHER 2 (MPI and OpenMP already been positively tested in our lab) in collaboration with Dr M Bane.
(7) Because FFLUX is a generic force field its novel technology is bound to make an impact on research fields outside of biomolecular simulations, such as battery design.
Partners within the newly established Royce Institute can be sought.
Sectors Chemicals,Energy,Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description This Fellowship has been used to lay the foundation of an indeed completely novel way of designing a force field (called FFLUX); there is nothing like it. This type of EPSRC funding has made this fundamental development possible, and I am grateful that computational chemistry has benefitted by having this groundbreaking foundation now in place. We are now in possession of a vital piece work, fully published in carefully written papers, and en route to be used in a wide variety of applications, starting in the biomolecular area. The architecture underpinning the current version of the force field FFLUX is far ahead of its time. Because of its solid and rigorous foundation, FFLUX will break through the current impasse unfortunately reached by the traditional force fields in terms of their lack of reliable predictions. Setting up FFLUX is a huge undertaking, and one that is not finished. However, the vast majority of the promised proofs-of-concept has been realised and it is now a matter of consolidating the beta-releasable in-house software called DL_FFLUX, which has been implemented in the molecular simulation package DL_POLY. The sustained invitations to machine learning and force field workshops proves that the work is gaining increased attention. However, given the manner in which EPSRC operates, the Fellowship Extension did unfortunately not materialise, such that progress will inevitably be slower towards the societal and economic benefits that EPSRC so obsessively wants to see. However, future grant applications will be written with colleagues who will benefit from the hard work that delivered all the proofs-of-concept. In the long term these efforts will lead to a first-time deep understanding of the nucleation process of peptides in aqueous solution, a process thought to be involved in the onset of Alzheimer's disease and a process vital to control peptide hydrogel formation, which has many industrial applications. Because of the step change and fundamental nature of the findings it takes a lot of time before the community absorbs it.
First Year Of Impact 2021
Sector Chemicals,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Societal