A Hybrid Statistical/Mechanistic Approach to Predicting Reaction Conditions

Lead Research Organisation: University of Cambridge
Department Name: Chemistry

Abstract

Year 1: Generic training activities for all first-year student members of the CDT.

Year 2-4: The project will work towards building a statistical/mechanistic hybrid model to predict the conditions necessary for C-H activation of pyridines in the ortho-position to the nitrogen-atom, alongside two related challenges: defining the molecular context of a reaction and how to collaborate with others using the Unified Data Model (UDM). Molecular context can be defined using six unique parameter headers: information, energy, time, space, structure and substance. To better understand how to incorporate the structure of relevant compounds in a model, a review of molecular representation will be conducted. Representation of molecules was thought solved, however advancements in Machine Learning (ML) have brought about a resurgence of interest, particularly for the generation of novel compounds. Furthermore, a case study will show how UDM can be used to represent the data necessary for ML driven solvent selection. UDM is an XML-based data format capable of representing most classes of data relevant to chemistry. Building better prediction models for reactivity can aid in retrosynthesis, thus helping chemists in the lab, and is also an enabling factor for building autonomous synthesis platforms. There is precedent for the use of highly accurate reactivity models which are only valid in a small area of chemical space, and developing the concept of 'molecular context' can ease the meshing of such reactivity models.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024220/1 31/05/2019 30/11/2027
2276995 Studentship EP/S024220/1 30/09/2019 29/09/2023 Daniel Stauso Wigh
 
Description When chemists run chemical reactions in the lab, it can often be difficult to select the right conditions. Using computational methods to help the chemists with this challenge is what I am currently working on. To this end, I recently published a literature review explaining how to represent molecules in a way that is understandable to computers, since good molecular representation is foundational to most (if not all) computational method in chemistry. I am also working on an algorithm for predicting the rate for a particular type of reaction, and I hope that this will help us understand how to best represent chemical reactions for computers.
Exploitation Route Enhanced understanding of how to represent molecules and chemical reactions is foundational to most (if not all) computational problems in chemistry. My review on molecular representation will provide insight that will be particularly valuable for those entering the field of computational chemistry (e.g. early career researchers and researchers moving closer to computational chemistry), and I hope this will be the case for reaction representation as well. Overall, I believe my work will make it easier and faster for others to build machine learning workflows for their own specific chemistry problem.
Sectors Chemicals,Pharmaceuticals and Medical Biotechnology

URL https://scholar.google.com/citations?user=V0EZ8LMAAAAJ&hl=en&oi=ao