Machine Learning and Molecular Modelling: A Synergistic Approach to Rapid Reactivity Prediction

Lead Research Organisation: University of Bath
Department Name: Chemistry

Abstract

The computational design of new chemical reactions is regarded as one of the "Holy Grails" of computational organic chemistry and biochemistry. Accurate and fast computational approaches to predicting chemical reactivity would provide cost-effective alternatives to time-consuming experimental approaches, and in some cases animal testing, in drug design, toxicology and chemical synthesis. Of great importance are mechanism-based prediction models because they are much more likely to reach general acceptance compared to computational "black-box" models which offer no insight into how and why predictions are made. Providing such insight is especially important for models to gain regulatory acceptance in toxicology and drug design. However, no current computational approach (molecular modelling or machine learning (ML)) to reactivity prediction offers the combination of fast, accurate predictions with clear mechanistic insight; one or more of these desirable characteristics must be sacrificed in pursuit of the others. This project will develop a novel, synergistic molecular modelling and ML approach to rapid, high-accuracy and mechanism-based reactivity prediction for use in toxicology, drug design and chemical synthesis and thus help realise the "Holy Grail".

We will train and validate ML models on large datasets (~10,000 compounds) that can correct energy barriers obtained from rapid molecular modelling techniques to those derived from prohibitively slow, high-accuracy methods. Our synergistic approach to reaction modelling will thus be to derive mechanistic insight from these rapid molecular modelling techniques and use our ML models to obtain fast and accurate reaction barriers. Models for C-N bond-forming reactions will be developed for use in covalent drug design (targeting lysine), toxicology (predicting mutagenicity and respiratory sensitisation) and pharmaceutical drug synthesis planning. To demonstrate the broad utility of our synergistic approach, we will use it to rationalise experimental reactivity data of biologically and synthetically relevant systems for which the use of current modelling approaches would be prohibitively slow. Rather than requiring a supercomputer, predictions will be possible even on a laptop which will represent a paradigm shift in reaction modelling.