A Hybrid Statistical/Mechanistic Approach to Predicting Reaction Conditions

Lead Research Organisation: University of Cambridge
Department Name: Chemistry

Abstract

Year 1: Generic training activities for all first-year student members of the CDT.

Year 2-4: The project will work towards building a statistical/mechanistic hybrid model to predict the conditions necessary for C-H activation of pyridines in the ortho-position to the nitrogen-atom, alongside two related challenges: defining the molecular context of a reaction and how to collaborate with others using the Unified Data Model (UDM). Molecular context can be defined using six unique parameter headers: information, energy, time, space, structure and substance. To better understand how to incorporate the structure of relevant compounds in a model, a review of molecular representation will be conducted. Representation of molecules was thought solved, however advancements in Machine Learning (ML) have brought about a resurgence of interest, particularly for the generation of novel compounds. Furthermore, a case study will show how UDM can be used to represent the data necessary for ML driven solvent selection. UDM is an XML-based data format capable of representing most classes of data relevant to chemistry. Building better prediction models for reactivity can aid in retrosynthesis, thus helping chemists in the lab, and is also an enabling factor for building autonomous synthesis platforms. There is precedent for the use of highly accurate reactivity models which are only valid in a small area of chemical space, and developing the concept of 'molecular context' can ease the meshing of such reactivity models.

Planned Impact

Who might benefit from this research? How might they benefit from this research?

Students
(a) The major beneficiaries of the CDT will, of course, be the students that train on the program. They will be equipped with a set of skills that will be highly desirable in the organic molecule making industries. Although the proposal is directing towards a need in the pharmaceutical industry, the training and research skills are totally transferable to industries like the argochemical sector (this is an almost seamless transition as the nature of the needs are near identical to that of pharma) but also the fine chemicals industries, CRO's who serve all of these industries. With some adaptation of the skills accrued then the students will also be able to apply their knowledge to problems in the materials industries, like polymers, organic electronics and chemical biology.

(b) Synthesis will also be evolving in academia and students equipped with skills in digital molecular technologies will be at a significant advantage in being apply to implement the skills acquired while training on the CDT. These students could be the rising stars of academia in 10 years time.

(c) The non-research based training will benefit the students by providing a set of transferable skills that will see them thrive in any chosen career.

(d) The industry contacts that will be generated from the variety of interactions planned in the CDT will give students both experience and insight into the machinations of the industrial sector, helping them to gain a different training experience (form industry taught courses) and hands on experience in industrial laboratories.

(e) All student in UCAM will be able to benefit in some way form the CDT. Training courses will not be restricted to CDT students (only courses that require payment will be CDT only, and even then, we will endeavour to make additional places available for non-CDT students). The overall standard of training for all students wil be raised by a CDT, meaning that benefit will be realised across the students of the associated departments. In additional, non CDT students can also be inspired by the research of the CDT and can immerse new techniques into their own groups.

Academic researchers in related fields (PIs)
(a) new research knowledge that results from this program will benefit PIs in UCAM and across the academic community. All research will be pre-competitive, with any commercial interests managed by Cambridge Enterprise

(b) a change in mnidset of how synthetic research is carried out

(c) new collaborations will be generated withing UCAM, but also externally on a national and international level.

(d) better, more closely aligned, interactions with industry as a result of knowledge transfer

(e) access to outstanding students

Broader public
(a) in principle, more potential medicines could be made available by the research of this CDT.

Economy
(a) a new highly skilled workforce literate in disciplines essential to industry needs will be available
(b) higher productivity in industry, faster access to new medicines
(c) spin out opportunities will create jobs and will stimulate the economy
(d) automation will not remove the need for skilled people, it will allow the researchers to think of solutions to the problems we dont yet understand leading to us being able to discover solutions faster

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024220/1 01/06/2019 30/11/2027
2276995 Studentship EP/S024220/1 01/10/2019 30/09/2023 Daniel Wigh
 
Description When chemists run chemical reactions in the lab, it can often be difficult to select the right conditions. Using computational methods to help the chemists with this challenge is what I am currently working on. To this end, I recently published a literature review explaining how to represent molecules in a way that is understandable to computers, since good molecular representation is foundational to most (if not all) computational method in chemistry. I recently submitted a paper describing an algorithm for predicting the rate for a particular type of reaction (protodeboronation), which will help wet-lab chemists who work on reactions involving this particular reaction. I am currently working on a novel way of building Deep Neural Networks (a type of machine learning) for prediction of reaction conditions in chemical reactions.
Exploitation Route Enhanced understanding of how to represent molecules and chemical reactions is foundational to most (if not all) computational problems in chemistry. My review on molecular representation will provide insight that will be particularly valuable for those entering the field of computational chemistry (e.g. early career researchers and researchers moving closer to computational chemistry), and I hope this will be the case for reaction representation as well. Overall, I believe my work will make it easier and faster for others to build machine learning workflows for their own specific chemistry problem. Furthermore, my algorithm for predicting the rate of protodeboronation will help wet-lab chemists that work with this reaction.
Sectors Chemicals,Pharmaceuticals and Medical Biotechnology

URL https://scholar.google.com/citations?user=V0EZ8LMAAAAJ&hl=en&oi=ao