Solubility prediction in organic solvents through a combination of chemometrics and computational chemistry
Lead Research Organisation:
University of Leeds
Department Name: Sch of Chemistry
Abstract
Solubility is an essential property in evaluating an Active Pharmaceutical Ingredient's potency. In pharmaceutical syntheses, solubility is vital in assisting purification of intermediates and synthetic route selection by predicting the cost of work-up and purification. Most current prediction models rely on experimental data, either as thermodynamic values, or as parameters for structural fragments. These allow semi-empirical models to adjust and compensate for inherent errors in their assumptions. Problems, however, arise when novel compounds are made with completely different structural patterns to those used to provide the semi-empirical parameters, i.e. outside the known chemical space.
The student will take a leading role in addressing this urgent gap in chemical knowledge. The project will exploit recent advances of Density Functional Theory (DFT) computational techniques to calculate properties of drug-like compounds. Statistical tools will be employed to analyse these properties and link them with experimental solubility. A model for solubility prediction will consequently be developed, with understanding of factors influencing solubility. Importantly, the DFT approach is not restricted to known chemical space. This will allow the project to quickly expand into novel chemical space, and provide predictions for experimental verification in the later stage.
The student will take a leading role in addressing this urgent gap in chemical knowledge. The project will exploit recent advances of Density Functional Theory (DFT) computational techniques to calculate properties of drug-like compounds. Statistical tools will be employed to analyse these properties and link them with experimental solubility. A model for solubility prediction will consequently be developed, with understanding of factors influencing solubility. Importantly, the DFT approach is not restricted to known chemical space. This will allow the project to quickly expand into novel chemical space, and provide predictions for experimental verification in the later stage.
People |
ORCID iD |
Andrew Blacker (Primary Supervisor) | |
Samuel Boobier (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509243/1 | 01/10/2015 | 31/12/2021 | |||
1939697 | Studentship | EP/N509243/1 | 01/10/2017 | 30/09/2021 | Samuel Boobier |
Description | The physical property of solubility is of interest to a number of audiences in pharmaceutical/agricultural/engineering sectors. We aimed to build predictive models by combining a number of established disciplines: statistical modelling; computational chemistry; quantum mechanical modelling; and machine learning. We created datasets in four solvents, mined from literature sources. We hope to make these publically available for others groups to use. We have built quantitative cheminformatics models in non-aqueous solvents for the first time. We have demonstrated that a small number of inputs, chosen to describe the physical chemistry of solubility, can be used to accurately predict solubility. We have benchmarked these models against previous methods with internal data, and independent data from an industry collaborator. We have identified a confidence interval for our predictions and a careful analysis of the models' limitations and physical meaning. |
Exploitation Route | Datasets can be used by those in the physical property prediction field (academia or chemical industry) on which to build new models. The models themselves can be used by a number of fields where solubility is an important factor (e.g. agrichemicals, drug formulation or process design). As the full details of the models themselves will be released, researchers in academia or industry can build on this work to further improve/extend the models. The methodological advances will be of use to the wider machine learning/predictive modelling community. |
Sectors | Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology |