Simulating catalysis: Multiscale embedding of machine learning potentials
Lead Research Organisation:
University of Bristol
Department Name: Biochemistry
Abstract
In the recent decades, computer simulations have become an essential part of the molecular scientist's toolbox. However, as for any computational method, molecular simulations require a compromise between speed and precision. The most precise techniques apply principles of quantum mechanics (QM) to the molecular systems and can precisely describe processes involving changes in the electronic structure, such as the breaking and forming of chemical bonds. However, they require tremendous computer resources, being prohibitively costly even for systems containing only several hundreds of atoms. On the other extreme are highly simplified "Molecular Mechanics" (MM) methods that ignore the quantum nature of molecules and instead describe the atoms as charged "balls" of certain size connected with springs representing the chemical bonds.
The core limitation of MM is its inability to describe breaking/forming of chemical bonds, therefore making it unsuitable for simulating chemical reactions. This drawback motivated the invention of combined "multiscale" models that rely on precise but expensive QM calculations to describe the part of the simulation system where the chemical reaction takes place, while treating the rest of the system with an efficient MM method. This "Quantum Mechanics/Molecular Mechanics" approach (QM/MM), honoured by the Nobel Prize in Chemistry in 2013, is now the state-of-the-art simulation technique for reactions in complex environments, such as those happening inside living organisms. Such simulations are important to understand and design catalysts, which increase the rate of chemical reactions (and can thereby reduce the amount of energy and resources required to produce molecules). However, QM/MM calculations are still only as fast as the QM method used, limiting dramatically the precision and timescale of the simulations.
A completely different approach is to employ techniques from the rapidly evolving field of machine learning (ML) and construct a method that can learn and then predict the outcome of a QM calculation. Once properly trained, an ML model can provide results with QM quality, but several orders of magnitude faster. However, ML models are still significantly slower than MM ones. Therefore, a multiscale "ML/MM" model would still offer huge savings of computer time compared to pure ML simulations. Unfortunately, however, existing ML training schemes are only suitable for calculations in gas phase and cannot take into account the presence of an MM environment.
The goal of the proposed research project is to develop a novel multiscale embedding approach that will allow the use of ML models as part of a ML/MM scheme. This will enable molecular simulations of unprecedented precision on processes with high complexity without limiting the detailed exploration of molecular conformations. To achieve this goal, we will take advantage of recent advances in machine learning and understanding of intermolecular interactions to develop a specialised ML workflow that predicts the interaction energy between the molecule described by ML and the MM environment. The workflow will be implemented as an open, publicly available software package that allows to train ML/MM models and run ML/MM molecular dynamics simulations of complex chemical processes, such as catalysed reactions. We expect this package to be readily adopted by a wide community of computational chemists working on enzymatic reactions, homo/heterogeneous catalysis and generally on processes in condensed phases, aided by specific training materials and workshops that we will provide. This will allow, for example, the development efficient computational workflows to understand and help design catalysts for more environmentally friendly production of desired molecules.
The core limitation of MM is its inability to describe breaking/forming of chemical bonds, therefore making it unsuitable for simulating chemical reactions. This drawback motivated the invention of combined "multiscale" models that rely on precise but expensive QM calculations to describe the part of the simulation system where the chemical reaction takes place, while treating the rest of the system with an efficient MM method. This "Quantum Mechanics/Molecular Mechanics" approach (QM/MM), honoured by the Nobel Prize in Chemistry in 2013, is now the state-of-the-art simulation technique for reactions in complex environments, such as those happening inside living organisms. Such simulations are important to understand and design catalysts, which increase the rate of chemical reactions (and can thereby reduce the amount of energy and resources required to produce molecules). However, QM/MM calculations are still only as fast as the QM method used, limiting dramatically the precision and timescale of the simulations.
A completely different approach is to employ techniques from the rapidly evolving field of machine learning (ML) and construct a method that can learn and then predict the outcome of a QM calculation. Once properly trained, an ML model can provide results with QM quality, but several orders of magnitude faster. However, ML models are still significantly slower than MM ones. Therefore, a multiscale "ML/MM" model would still offer huge savings of computer time compared to pure ML simulations. Unfortunately, however, existing ML training schemes are only suitable for calculations in gas phase and cannot take into account the presence of an MM environment.
The goal of the proposed research project is to develop a novel multiscale embedding approach that will allow the use of ML models as part of a ML/MM scheme. This will enable molecular simulations of unprecedented precision on processes with high complexity without limiting the detailed exploration of molecular conformations. To achieve this goal, we will take advantage of recent advances in machine learning and understanding of intermolecular interactions to develop a specialised ML workflow that predicts the interaction energy between the molecule described by ML and the MM environment. The workflow will be implemented as an open, publicly available software package that allows to train ML/MM models and run ML/MM molecular dynamics simulations of complex chemical processes, such as catalysed reactions. We expect this package to be readily adopted by a wide community of computational chemists working on enzymatic reactions, homo/heterogeneous catalysis and generally on processes in condensed phases, aided by specific training materials and workshops that we will provide. This will allow, for example, the development efficient computational workflows to understand and help design catalysts for more environmentally friendly production of desired molecules.
Publications


Deeks HM
(2023)
Free energy along drug-protein binding pathways interactively sampled in virtual reality.
in Scientific reports

Jabeen H
(2024)
Electric Fields Are a Key Determinant of Carbapenemase Activity in Class A ß-Lactamases.
in ACS catalysis


Jäckering A
(2024)
Influence of Wobbling Tryptophan and Mutations on PET Degradation Explored by QM/MM Free Energy Calculations
in Journal of Chemical Information and Modeling

Woods C
(2024)
Sire: An interoperability engine for prototyping algorithms and exchanging information between molecular simulation programs
in The Journal of Chemical Physics



Zinovjev K
(2023)
Electrostatic Embedding of Machine Learning Potentials.
in Journal of chemical theory and computation
Title | emle-engine |
Description | A simple, versatile interface to run multi-scale MM/ML simulations with electrostatic embedding of machine learning potentials using an ORCA-like interface. |
Type Of Material | Technology assay or reagent |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | Initial interest from other research groups and industry. |
URL | https://github.com/chemle/emle-engine |
Title | Electrostatic embedding scheme for Machine Learning potentials |
Description | Training data was generated based on the QM7 data set, consisting of 7165 molecules with up to 7 heavy atoms (C, N, O, and S, in addition to H). For each molecule, the density and molecular dipolar polarizability were obtained at the B3LYP/cc-PVTZ level of theory without reoptimizing the structures. Training procedure and properties and parameters required by the embedding scheme - as trained/optimized for ground state neutral compounds containing H, C, N, O, and S elements. |
Type Of Material | Computer model/algorithm |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | None yet. |
URL | https://github.com/emedio/embedding |
Description | University of Valencia |
Organisation | University of Valencia |
Country | Spain |
Sector | Academic/University |
PI Contribution | Me and my research team have regular meetings with the team at the University of Valencia (mostly every 2 weeks). We collaborate intensively, coordinating efforts directly related to implementing methods and applying electrostatic embedding of machine learning potentials. This also involves the sharing of data and code. |
Collaborator Contribution | The team at the University of Valencia meets with the Bristol team regularly (see above). We share data and code. |
Impact | Output 1: https://doi.org/10.26434/chemrxiv-2023-6rng3-v2 (preprint, under review) Output 2: https://github.com/chemle/emle-engine |
Start Year | 2023 |
Title | EMLE-engine |
Description | A simple interface to allow electrostatic embedding of machine learning potentials using an ORCA-like interface. An example sander (AmberTools) implementation is provided. This works by reusing the existing interface between sander and ORCA, meaning that no modifications to sander are needed. emle-engine supports electrostatic, non-polarisable, and MM embedding. Here non-polarisable emedding uses the EMLE model to predict charges for the QM region, but ignores the induced component of the potential. MM embedding allows the user to specify fixed MM charges for the QM atoms, with induction once again disabled. The use of different embedding schemes provides a useful reference for determining the benefit of using electrostatic embedding for a given system. |
Type Of Technology | Software |
Year Produced | 2024 |
Open Source License? | Yes |
Impact | No directly impacts yet. |
URL | https://github.com/chemle/emle-engine |