Solubility prediction in organic solvents through a combination of chemometrics and computational chemistry

Lead Research Organisation: University of Leeds
Department Name: Sch of Chemistry

Abstract

Solubility is an essential property in evaluating an Active Pharmaceutical Ingredient's potency. In pharmaceutical syntheses, solubility is vital in assisting purification of intermediates and synthetic route selection by predicting the cost of work-up and purification. Most current prediction models rely on experimental data, either as thermodynamic values, or as parameters for structural fragments. These allow semi-empirical models to adjust and compensate for inherent errors in their assumptions. Problems, however, arise when novel compounds are made with completely different structural patterns to those used to provide the semi-empirical parameters, i.e. outside the known chemical space.

The student will take a leading role in addressing this urgent gap in chemical knowledge. The project will exploit recent advances of Density Functional Theory (DFT) computational techniques to calculate properties of drug-like compounds. Statistical tools will be employed to analyse these properties and link them with experimental solubility. A model for solubility prediction will consequently be developed, with understanding of factors influencing solubility. Importantly, the DFT approach is not restricted to known chemical space. This will allow the project to quickly expand into novel chemical space, and provide predictions for experimental verification in the later stage.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509243/1 01/10/2015 31/12/2021
1939697 Studentship EP/N509243/1 01/10/2017 30/09/2021 Samuel Boobier
 
Description The physical property of solubility is of interest to a number of audiences in pharmaceutical/agricultural/engineering sectors.
We aimed to build predictive models by combining a number of established disciplines: statistical modelling; computational chemistry; quantum mechanical modelling; and machine learning.
We created datasets in four solvents, mined from literature sources. We hope to make these publically available for others groups to use. We have built quantitative cheminformatics models in non-aqueous solvents for the first time. We have demonstrated that a small number of inputs, chosen to describe the physical chemistry of solubility, can be used to accurately predict solubility. We have benchmarked these models against previous methods with internal data, and independent data from an industry collaborator. We have identified a confidence interval for our predictions and a careful analysis of the models' limitations and physical meaning.
Exploitation Route Datasets can be used by those in the physical property prediction field (academia or chemical industry) on which to build new models.
The models themselves can be used by a number of fields where solubility is an important factor (e.g. agrichemicals, drug formulation or process design). As the full details of the models themselves will be released, researchers in academia or industry can build on this work to further improve/extend the models. The methodological advances will be of use to the wider machine learning/predictive modelling community.
Sectors Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology