Augmenting Molecular Property Prediction with AI and Chemical Calculation

Lead Research Organisation: University of Strathclyde
Department Name: Pure and Applied Chemistry

Abstract

Aims. We propose to apply cutting-edge AI algorithms to chemical simulation data, in order to extract key molecular level features that can be used to generate powerful statistical models for molecular property prediction using machine learning. The work has direct applications to improving molecular property prediction, which is of critical interest to the chemical industry. Companies such as Unilever, P&G, Akzonobel, GSK and Astrazeneca all have active research programs that require molecular property prediction via in silico methods.
Background. Intricate organic solutes such as surfactants, macrocycles and biological compounds are becoming increasingly common place as industrial agents. These new modalities are increasingly required in a variety of industrial applications, for example: bio-surfactants for environmental remediation, novel polymeric materials for carbon capture and beyond-rule-of-5 drugs to modulate challenging biological targets in the pharmaceutical industry. Unfortunately, many of these new poly-functionalized organics exists in a chemical space where traditional guidelines on physical property optimization are inapplicable and predicting properties to aid design is currently extremely challenging [Nature Chem. Biol. 2016, 12, 1065-1074]. Computational modelling of poly-functionalized organic compounds is difficult because of their complex physical chemistry. Traditional "single-molecule" descriptors fail because they operate with simplified representations of molecular structure that ignore conformational degrees of freedom and the environment. Physics-based calculation methods provide a more realistic representation of the chemical environment and dynamics, but there are many properties that cannot be predicted with satisfactory accuracy using these methods alone.
New Perspectives. Molecular dynamics (MD) simulations are a method to directly study the dynamics of a chemical system, leading to rich chemical insights. The featurization of a molecular dynamics trajectory will provide additional information for AI algorithms to build new powerful predictive models. An alternative approach is cDFT, a method that operates with functions describing local variations in the molecular density of a chemical system [Chem. Rev., 2015, 115, 6312-6356]. These density functions provide a wealth of information that is invaluable for interpreting chemical systems; this is evidenced by their use in predicting water sites on protein surfaces, fragment-based drug discovery, and molecular engineering. As with MD trajectories these density functions can be featurized and provided to AI methods to improve property predictions.
Preliminary Evidence. One of us has recently shown accurate predictions of bioconcentration factors using a Convolution Neural Network (CNN) trained on 3D solute solvent correlation functions (computed by 3D RISM a cDFT method) [J. Phys.: Condens. Matter, 2018, 30, 32LT03]. This paper was selected as a research highlight by Physics World and Phys.org. Additionally, work from the DSP group has shown that permeability models performed better than the state-of-the-art tools for "small molecules" when they were built using descriptors computed with the 1D Reference Interaction Site Model [Mol. Pharmaceutics, 2015, 12, 3420-3432]. We have also demonstrated that the addition of features from chemical calculation can enhance the predictive accuracy of machine learning models for predicting the thermodynamics of sublimation and binding affinity. [J. Chem. Inf. Model., 2016, 56, 2162-2179; J. Chem. Inf. Model., 2018, 58, 1253-1265]

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517665/1 01/10/2019 30/09/2024
2318858 Studentship EP/T517665/1 01/10/2019 30/09/2023 Jonathan Conn