Simulating catalysis: Multiscale embedding of machine learning potentials

Lead Research Organisation: University of Bristol
Department Name: Biochemistry

Abstract

In the recent decades, computer simulations have become an essential part of the molecular scientist's toolbox. However, as for any computational method, molecular simulations require a compromise between speed and precision. The most precise techniques apply principles of quantum mechanics (QM) to the molecular systems and can precisely describe processes involving changes in the electronic structure, such as the breaking and forming of chemical bonds. However, they require tremendous computer resources, being prohibitively costly even for systems containing only several hundreds of atoms. On the other extreme are highly simplified "Molecular Mechanics" (MM) methods that ignore the quantum nature of molecules and instead describe the atoms as charged "balls" of certain size connected with springs representing the chemical bonds.

The core limitation of MM is its inability to describe breaking/forming of chemical bonds, therefore making it unsuitable for simulating chemical reactions. This drawback motivated the invention of combined "multiscale" models that rely on precise but expensive QM calculations to describe the part of the simulation system where the chemical reaction takes place, while treating the rest of the system with an efficient MM method. This "Quantum Mechanics/Molecular Mechanics" approach (QM/MM), honoured by the Nobel Prize in Chemistry in 2013, is now the state-of-the-art simulation technique for reactions in complex environments, such as those happening inside living organisms. Such simulations are important to understand and design catalysts, which increase the rate of chemical reactions (and can thereby reduce the amount of energy and resources required to produce molecules). However, QM/MM calculations are still only as fast as the QM method used, limiting dramatically the precision and timescale of the simulations.

A completely different approach is to employ techniques from the rapidly evolving field of machine learning (ML) and construct a method that can learn and then predict the outcome of a QM calculation. Once properly trained, an ML model can provide results with QM quality, but several orders of magnitude faster. However, ML models are still significantly slower than MM ones. Therefore, a multiscale "ML/MM" model would still offer huge savings of computer time compared to pure ML simulations. Unfortunately, however, existing ML training schemes are only suitable for calculations in gas phase and cannot take into account the presence of an MM environment.

The goal of the proposed research project is to develop a novel multiscale embedding approach that will allow the use of ML models as part of a ML/MM scheme. This will enable molecular simulations of unprecedented precision on processes with high complexity without limiting the detailed exploration of molecular conformations. To achieve this goal, we will take advantage of recent advances in machine learning and understanding of intermolecular interactions to develop a specialised ML workflow that predicts the interaction energy between the molecule described by ML and the MM environment. The workflow will be implemented as an open, publicly available software package that allows to train ML/MM models and run ML/MM molecular dynamics simulations of complex chemical processes, such as catalysed reactions. We expect this package to be readily adopted by a wide community of computational chemists working on enzymatic reactions, homo/heterogeneous catalysis and generally on processes in condensed phases, aided by specific training materials and workshops that we will provide. This will allow, for example, the development efficient computational workflows to understand and help design catalysts for more environmentally friendly production of desired molecules.

Publications

10 25 50
 
Title emle-engine 
Description A simple, versatile interface to run multi-scale MM/ML simulations with electrostatic embedding of machine learning potentials using an ORCA-like interface. 
Type Of Material Technology assay or reagent 
Year Produced 2024 
Provided To Others? Yes  
Impact Initial interest from other research groups and industry. 
URL https://github.com/chemle/emle-engine
 
Title Electrostatic embedding scheme for Machine Learning potentials 
Description Training data was generated based on the QM7 data set, consisting of 7165 molecules with up to 7 heavy atoms (C, N, O, and S, in addition to H). For each molecule, the density and molecular dipolar polarizability were obtained at the B3LYP/cc-PVTZ level of theory without reoptimizing the structures. Training procedure and properties and parameters required by the embedding scheme - as trained/optimized for ground state neutral compounds containing H, C, N, O, and S elements. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact None yet. 
URL https://github.com/emedio/embedding
 
Description University of Valencia 
Organisation University of Valencia
Country Spain 
Sector Academic/University 
PI Contribution Me and my research team have regular meetings with the team at the University of Valencia (mostly every 2 weeks). We collaborate intensively, coordinating efforts directly related to implementing methods and applying electrostatic embedding of machine learning potentials. This also involves the sharing of data and code.
Collaborator Contribution The team at the University of Valencia meets with the Bristol team regularly (see above). We share data and code.
Impact Output 1: https://doi.org/10.26434/chemrxiv-2023-6rng3-v2 (preprint, under review) Output 2: https://github.com/chemle/emle-engine
Start Year 2023
 
Title EMLE-engine 
Description A simple interface to allow electrostatic embedding of machine learning potentials using an ORCA-like interface. An example sander (AmberTools) implementation is provided. This works by reusing the existing interface between sander and ORCA, meaning that no modifications to sander are needed. emle-engine supports electrostatic, non-polarisable, and MM embedding. Here non-polarisable emedding uses the EMLE model to predict charges for the QM region, but ignores the induced component of the potential. MM embedding allows the user to specify fixed MM charges for the QM atoms, with induction once again disabled. The use of different embedding schemes provides a useful reference for determining the benefit of using electrostatic embedding for a given system. 
Type Of Technology Software 
Year Produced 2024 
Open Source License? Yes  
Impact No directly impacts yet. 
URL https://github.com/chemle/emle-engine