The representation, and effect, of chemical context on the accuracy of machine learning chemical reaction prediction models
Lead Research Organisation:
University of Cambridge
Department Name: Chemistry
Abstract
Prediction of chemical reaction yields using machine learning models is an active research area. Current state of the art yield prediction models perform well when trained on HTE data, but have unsatisfactory performance when trained on literature data. These models use reaction representations which only convey information about the reactants and products of a reaction. The effect of including additional reaction information on the accuracy of the yield prediction is unknown. It is proposed including this information in the form of an ontology could improve the yield prediction accuracy. Thus far, reaction and physical property data extraction from Reaxys and the DDB, NLP of associated Reaxys text, and design of a reaction ontology have been completed. The extracted data is used to automatically populate ontologies, with each reaction populating its own ontology to serve as a new reaction representation. The ontology structure is generic enough to accommodate any reaction type, and is designed to align with related ontologies describing reactor systems. Future tasks in this project are the embedding of ontologies to create ML ready inputs, and the training and evaluation of a transformer model using the ontology embeddings for yield prediction.
Organisations
People |
ORCID iD |
| Michael Zhou (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S024220/1 | 31/05/2019 | 30/11/2027 | |||
| 2895024 | Studentship | EP/S024220/1 | 30/09/2023 | 29/09/2027 | Michael Zhou |