Investigation of Transition-Metal Catalysed C-H Activation Using Quantum Chemistry and Machine Learning

Lead Research Organisation: University of Oxford
Department Name: Oxford Chemistry

Abstract

Direct functionalisation of C-H bonds to form new C-C bonds (or C-N, C-O etc.) has received significant interest in the past two decades, particularly in drug discovery. This broad class of reactions is attractive since it enables shorter and more efficient syntheses, avoiding the use of halide/amine pre-functionalised molecules prior to performing their coupling. C-H activation reactions usually require the use of a transition metal (TM) catalyst (e.g., Pd, Pt). The reaction is suggested to proceed via metal insertion at the C-H bond, although alternative mechanisms are possible. However, even with TM catalysts, C-H activation requires harsh reaction conditions due to its high bond dissociation energy, which reduces tolerance of sensitive functional groups. Additionally, there is often poor regio- and stereo-selectivity. The aim of this project is twofold, the first is the automation of reaction path generation for these reactions with quantum chemical methods. The second goal is the prediction of reaction products for given reactants via machine learning (ML) approaches. Computational studies can help elucidate the mechanism of these reactions and thereby allow improving their reactivity and selectivity. Most of the previous work has been done via manual exploration of potential energy surfaces to find transition states (TS), which is time consuming and requires expert users. Automating such analyses would reduce the resources spent on modelling and increase their impact on synthetic development. Efforts to automate the study of this type of reactions are only recently being reported with model systems. This project aims to leverage the utility for C-H activation reactions. We will explore the use semi-empirical (SE) methods to obtain TS guesses and then augment that with ML-based correction to obtain a geometry close to the DFT-optimised TS structure. This combined SE-ML approach has been applied to the prediction of solvation energy, heats of formation etc., and shown to converge faster and require less training data than pure ML models. We will also aim to extend the generalisability of this approach to predict C-H activation with any metal catalyst (i.e., Ru, Fe etc.). The second avenue to explore will be the use of ML methods to predict the reaction products, especially the regio-/stereo-selectivity of C-H activation reaction, given the reactants. Here we will employ the IBM Rxn platform as a starting point, which has been shown to achieve 90% accuracy on the first prediction against the known reaction outcome. This ML approach, introduced by Schwaller et al, consists of a neural network language-transformer model which "translates" SMILES strings of reactants into the product SMILES. Even though ML models to predict general reactions are available, their efficacy for C-H activation reactions has not been extensively tested. We intend to investigate whether language-transformer based ML models can predict C-H activation regioselectivity and stereoselectivity. We will also implement 3D structure-based ML models (e.g., based on molecular electrostatic potential) and test whether it is more successful in predicting selectivity. This project falls within the EPSRC research area "computational and theoretical chemistry". But it also touches upon areas of reaction mechanism, medicinal chemistry, transition metal catalysis, DFT and machine learning. This project aligns with the goals of EPSRC in that area, including collaboration with pharmaceutical sector and software development. The latter will be beneficial to improving computational workflows and the study of catalytic mechanisms which are relevant in the broader scientific context. We are very happy to have AstraZeneca as our industrial collaborators. They assist by suggesting which routes of inquiry might align better with the synthetic and drug discovery goals of pharmaceutical industries.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W522211/1 01/10/2021 30/09/2027
2759938 Studentship EP/W522211/1 01/10/2022 30/09/2026 Shoubhik Maiti