Predicting the Volume of Distribution of Drugs and Toxicants with Data Mining Methods

Lead Research Organisation: University of Kent
Department Name: Sch of Computing


Paracelsus, a physician in the early 16th century, is credited with the phrase: "All things are poison, and nothing is without poison; only the dose permits something not to be poisonous" ( Despite significant advances in pharmacology in the last decades, at present it is still very difficult to find good answers to the questions of how much, how often and for how long a drug should be given to a patient, in order to maximize its therapeutic effect and minimize its adverse effects. These problems are the central concern of the related areas of pharmacokinetics and pharmacodynamics. Pharmacokinetics is concerned with how a drug is processed by the body, i.e., the relationship between drug input parameters (e.g. amount of drug in a dose and dose frequency) and the concentration of the drug in the body with time. In contrast, pharmacodynamics is concerned with how a drug affects the body, i.e., the relationship between drug concentration and the therapeutic and adverse effects of the drug with time.

This project focuses on an important pharmacokinetics problem: how to estimate the volume of distribution of a drug, which represents the volume into which a drug is distributed once it has entered systemically into the body. Estimating a drug's volume of distribution is important because it predicts the drug's plasma concentration for a given amount of drug in the body and it influences the drug's half-life, which in turn is very important to determine the correct dosage regimen that clinicians should prescribe to patients.

This project aims at developing new computational data mining methods to predict the volume of distribution of drugs. The data mining context for this project is the regression task, where the system is given a set of instances representing a set of objects, where each instance consists of a target (response) attribute (or dependent variable) and a set of predictor attributes (features or independent variables) describing an object. Then the system discovers a regression model that predicts the value of the target attribute for an instance based on the values of its predictor attributes. In this project, the objects to be classified will be chemical compounds or medical drugs, the target attribute to be predicted will be a drug's volume of distribution and the predictor attributes will refer to several types of molecular and physicochemical properties of drugs. The data mining methods to be developed in the project will be compared against traditional data analysis methods used for predicting a drug's volume of distribution.

Planned Impact

This is a discipline hopping proposal, where the investigator will hop into an entirely new discipline for him, namely pharmaceutical sciences. It is also a relatively short project, involving the equivalent of only one full-time researcher for one year (more precisely, only the investigator working on a part-time (50%) basis for two years). Hence, the pathways to impact naturally have to be interpreted in this context.
Concerning academic impact, in the relatively short term (next two years), the research will benefit mainly researchers and practitioners in the area of pharmacoinformatics, which consists of using computational methods for solving pharmaceutical sciences problems. The benefit for those users will be a novel computational method (developed in this project) for predicting the volume of distribution of a drug, with the applications extendable to other fields of computer-based drug discovery and design problems. The research will also benefit, to a lesser extent, researchers and practitioners in the computer science area of evolutionary algorithms. The benefit for those users will be a novel evolutionary algorithm for attribute selection in regression. In this project that algorithm will be applied only to pharmaceutical sciences data, but, like other algorithms of this type, it will be generic enough to be used in other application domains.
The socio-economic impact refers mainly to engagement with pharmaceutical scientists in industry, but since the applicant will be hopping into a new discipline, engagement with academic pharmaceutical scientists is also an important part of the pathway to impacts. Hence, as part of this project, the applicant will establish contact with pharmaceutical scientists working in both academia and the pharmaceutical industry, in order to do follow up research involving collaboration with those types of partner. In terms of medium-term and long-term socio-economic impact, the main purpose of the project is to provide the applicant with enough knowledge and skills in the pharmaceutical sciences to allow him to carry out research in that area in collaboration with pharmaceutical scientists working in industry. Once the project is completed, the applicant will be in a good position to do follow up research involving collaboration with the pharmaceutical industry. That research will have a potentially significant socio-economic impact, since it will be aimed at providing the pharmaceutical industry with a more cost-effective approach to drug design, which in turn would tend to provide better healthcare for the general population (in the UK and abroad), given the strategic importance of the pharmaceutical industry in the improvement of patients' health. A more detailed discussion of how the applicant will engage with pharmaceutical scientists in academia and industry, in order to achieve the desired impact, is provided in the document "Pathways to Impact".


10 25 50
Description We have done experiments with several types of data mining methods to predict the volume of distribution of a chemical compound or drug, analysing data from the distribution of compounds across tissues in humans. We have identified the types of data mining methods and the types of descriptors (features describing compounds) that led to the more acccurate predictions of the volume of distribution.
Exploitation Route We have published our findings, particularly the identification of the descriptors of chemical compounds that are most relevant for predicting the volume of distribution. These findings can be used by researchers and practitioners (e.g. in pharmaceutical companies) to predict the volume of distribution of a chemical compound into the human body. This could be useful for pharmaceutical companies to decide in advance whether or not it is worth investing a lot of money to try to develop a drug based on that compound.
Sectors Pharmaceuticals and Medical Biotechnology

Description University of Kent 50th Anniversity Postgraduate Research Studentship
Amount £53,577 (GBP)
Organisation University of Kent 
Sector Academic/University
Country United Kingdom
Start 09/2014 
End 09/2017
Description Collaboration with Dr. Andreas Bender, University of Cambridge 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution We are starting to collaborate now in a PhD project to improve the computational prediction of drug distribution in the body by incorporating the role of transporter proteins. I am the co-supervisor of the PhD student in this project. The principal supervisor is Dr. Ghafourian, Lecturer at the Medway School of Pharmacy. The project started very recently, in Oct. 2014.
Collaborator Contribution Dr. Andreas Bender is collaborating in this project. Initially (the project started very recently, in Oct. 2014) he is helping our PhD student to prepare a dataset of transporters and substrates, based on the Metrabase database of transporters created by his research team at Cambridge.
Impact As mentioned above, the collaboration started very recently, no outputs yet.
Start Year 2014