Machine Learning Methods for Predicting Phospholipidosis

Lead Research Organisation: University of Cambridge

Department Name: Chemistry

Abstract

Phospholipidosis is the accumulation of excessive quantities of fatty material (specifically phospholipids) within cells, which can occur in many different organs and cell types. Effects have been noted in the nervous system, lymphatic system, liver, kidneys, eyes and lungs. Phospholipidosis is of great concern to the pharmaceutical industry, especially in the context of the nervous system, where phospholipidosis in neurons can disrupt cell signalling.Since the development of medicines is such an enormously expensive process, it is extremely important to be able to predict adverse effects from chemical structure in advance of synthesis. Ideally, predictions of toxicity should be made at a very early stage in the design of new medicines, hence minimising the expense and time wasted on medicines that turn out to be unsafe or ineffective.In this project, we will produce predictive computer models of the phospholipidosis inducing potential of substances that might possibly be developed into medicines. These models will be substantially more sophisticated and accurate than the models that have previously appeared in the scientific literature. The main method we will use is called Random Forest. The forest is a set of several hundred decision trees , each of which is basically a flow diagram. We will train them to learn patterns in the known properties of existing medicines, and failed candidates, and their tendencies to induce phospholipidosis. However, the way in which we will generate the trees involves computer-simulated dice-rolling. This will ensure that they are all different, though based on the same underlying information. The decision trees then behave like jury members, voting on whether each new substance should be classed as safe or unsafe.The work proposed here is a cost-effective project with a very high probability of successfully predicting phospholipidosis inducing potential. It uses state-of-the art computer-based chemistry and machine learning methods to address a major current problem in designing and developing medicines. More generally, this work is at the cutting edge of the developing field of computational toxicology. For social and political reasons, this is almost certain to become a hot area as concerns about the environmental and health effects of chemicals and medicines mount, at the same time as animal experiments are likely to be increasingly phased out.

Funded Value:

£100,275

Funded Period:

Oct 08 - Sep 11

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/F049102/1

Principal Investigator:

John Mitchell

Research Subject:

Chemical synthesis (75%)

Info. & commun. Technol. (25%)

Research Topic:

Artificial Intelligence (25%)

Biological & Medicinal Chem. (75%)

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
John Mitchell (Principal Investigator)
Robert Glen (Co-Investigator)

Publications

Author Name Title Publication Date Published

10 25 50

John Mitchell (2010) Evolutionary Algorithms for the Prediction of Phospholipidosis: Different Binary Classification Metrics for Use as a Fitness Function

Lowe R (2010) Predicting phospholipidosis using machine learning. in Molecular pharmaceutics

Lowe R (2011) Classifying molecules using a sparse probabilistic kernel binary classifier. in Journal of chemical information and modeling

Lowe Robert Alexander (2012) Investigating machine learning methods in chemistry

Lowe R (2012) Predicting the mechanism of phospholipidosis. in Journal of cheminformatics

Key Findings
Impact Summary


Description	Different phospholipidosis-inducing compounds are predicted to interact with different putative phospholipidosis-relevant targets. This strongly suggests that different compounds induce phospholipidosis via different targets, and therefore also by different mechanisms.
Exploitation Route	Further experimental and computational research could follow up the mechanistic suggestions we have made.
Sectors	Chemicals,Healthcare,Manufacturing, including Industrial Biotechology


Description	Relevance vector machine software was made publicly available for use by SMEs and larger companies in sectors such as biotech, chemicals and pharmaceuticals. Datasets developed in the course of the project releating to phospholipidosis have also been made available and distributed. Finding and methodologies from the project are being incorporated into University level teaching material.
First Year Of Impact	2011
Sector	Agriculture, Food and Drink,Chemicals,Education,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types	Economic

Abstract

Organisations

People

ORCID iD

Publications