Using Heterogeneous Information Sources for Understanding the Mode of Action of Compounds

Lead Research Organisation: University of Cambridge
Department Name: Chemistry

Abstract

The mode of action of a compound in a biological system can be described on different levels, such as on the ligand-protein interaction level (e.g. binding to a target, inhibiting an enzyme etc.), as well as the effect level in a biological system (e.g. using genomic or proteomic readouts, cellular morphology readouts etc.). All of those viewpoints are equally valid, and hence an integration of different types of information to arrive at an integrated view on the mode of action of compounds in a biological system is highly desirable. This is where the current project aims to make a step forward. We will compile a compound data set which links molecular structure to bioactivites, and biological effects, on different levels, such as (but not limited to) the above. Subsequently, data mining algorithms will be employed that aim to establish links between the different levels of chemical and biological effects, i.e. to understand which interaction with which biological targets will lead to a particular downstream (often observable) effect. Within those links the next step is then to identify patterns, i.e. to be able to generalize which type of interaction, with which protein (or signalling cascade, etc.) will lead to a particular effect, which can then be translated to other, related biological systems (such as signalling networks). This work will be performed on both public and proprietary data, which will enable us to both publish our findings in a suitable manner, as well as to perform prospective validations with the company sponsor, AstraZeneca, in particular areas of interest.

Publications

10 25 50
 
Description The aim of this research project is to better understand compound's MoA by using different types of information. Therefore, we focused on predicting the activity or inactivity of compounds over a range of targets/proteins (a process known as target prediction) by using different types of compounds' information. The two type of information used are: 1. chemical data in the form of Extended Connectivity Fingerprints and 2. image-based features derived from cell images. Chemical data is a standard type of information that is used for target prediction and can easily calculated. Image-based data is an evolving type of compound information and we used the largest publicly available dataset consisting of 30,000 compounds with precalculated image features. Image-based data is depicting the changes in the cell morphology when a compound is applied on the cells. We trained one model for target prediction by using the chemical data and then one by using the image-based features with a method called Bayesian Matrix Factorisation Macau and we compared the two types of information. We identified that both types of side information are good for target prediction (model for 224 proteins/targets). However, some targets were better predicted by image-based data and some others by chemical data and therefore this can be useful in the target prediction of certain targets. This is an interesting finding because we were able to identify active compounds towards some targets when we used the image-based features and not when we used the chemical data. This is a novel knowledge for the targets that we identified that are better predicted by image data and can be a useful information for their bioactivity prediction in the future. In conclusion, target prediction models trained with cell paint data can be used as an alternative of or complementary to chemical data for target prediction when pharmaceutical companies are selecting/ triaging compounds for further experimental validation.
Exploitation Route The main outcome is that we identified a type of compound information (cell image-based data) to be useful in target prediction and we identified some targets/proteins that were better predicted by image-based features. As a result, target prediction models trained with image data can be used as an alternative of or complementary to chemical data for target prediction when pharmaceutical companies are selecting/ triaging compounds for further experimental validation.
Sectors Pharmaceuticals and Medical Biotechnology