Solubility prediction in organic solvents through a combination of chemometrics and computational chemistry

Lead Research Organisation: University of Leeds

Department Name: Sch of Chemistry

Abstract

Solubility is an essential property in evaluating an Active Pharmaceutical Ingredient's potency. In pharmaceutical syntheses, solubility is vital in assisting purification of intermediates and synthetic route selection by predicting the cost of work-up and purification. Most current prediction models rely on experimental data, either as thermodynamic values, or as parameters for structural fragments. These allow semi-empirical models to adjust and compensate for inherent errors in their assumptions. Problems, however, arise when novel compounds are made with completely different structural patterns to those used to provide the semi-empirical parameters, i.e. outside the known chemical space.

The student will take a leading role in addressing this urgent gap in chemical knowledge. The project will exploit recent advances of Density Functional Theory (DFT) computational techniques to calculate properties of drug-like compounds. Statistical tools will be employed to analyse these properties and link them with experimental solubility. A model for solubility prediction will consequently be developed, with understanding of factors influencing solubility. Importantly, the DFT approach is not restricted to known chemical space. This will allow the project to quickly expand into novel chemical space, and provide predictions for experimental verification in the later stage.

Student:

Samuel Boobier

Period of Study:

Oct 17 - Sep 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1939697

Research Topic:

Unclassified

Organisations

People	ORCID iD
Andrew Blacker (Primary Supervisor)
Samuel Boobier (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509243/1			01/10/2015	31/12/2021
1939697	Studentship	EP/N509243/1	01/10/2017	30/09/2021	Samuel Boobier

Key Findings


Description	The physical property of solubility is of interest to a number of audiences in pharmaceutical/agricultural/engineering sectors. We aimed to build predictive models by combining a number of established disciplines: statistical modelling; computational chemistry; quantum mechanical modelling; and machine learning. We created datasets in four solvents, mined from literature sources. We hope to make these publically available for others groups to use. We have built quantitative cheminformatics models in non-aqueous solvents for the first time. We have demonstrated that a small number of inputs, chosen to describe the physical chemistry of solubility, can be used to accurately predict solubility. We have benchmarked these models against previous methods with internal data, and independent data from an industry collaborator. We have identified a confidence interval for our predictions and a careful analysis of the models' limitations and physical meaning.
Exploitation Route	Datasets can be used by those in the physical property prediction field (academia or chemical industry) on which to build new models. The models themselves can be used by a number of fields where solubility is an important factor (e.g. agrichemicals, drug formulation or process design). As the full details of the models themselves will be released, researchers in academia or industry can build on this work to further improve/extend the models. The methodological advances will be of use to the wider machine learning/predictive modelling community.
Sectors	Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects