A Synergistic Integration of Natural and Artificial Immunology for the Prediction of Hierarchical Protein Functions

Lead Research Organisation: University of Kent
Department Name: Sch of Computing

Abstract

At present biologists are producing very large amounts of data about genes, as a result of a number of automated experiments. A large part of this data refers to proteins, which are the products made by genes. That is, one can think of the genome (the entire set of genes of an organism) as an, encoded text that is decoded to produce proteins. Genes are passive elements, but proteins are active elements, i.e. they perform a variety of functions which are essential to the survival of any organism. The very large amount of data about protein functions currently available is very valuable, because it can potentially lead to a better understanding and treatment of diseases, design of more effective medical drugs, etc. However, in order to harvest the potential of this large amount of data, we need to use intelligent data analysis (or data mining ) techniques that mine (analyse) the data and transform it into useful knowledge, e.g., knowledge specifying which kinds of protein functions are more related to a given kind of disease.This project is inter-disciplinary, because it integrates biology and computer science. From a biology point of view, the project will focus on predicting the functions of a very important kind of protein, which is the target for a large number of medical drugs on the market. From a computer science point of view, the general goal of the project is to automatically discover knowledge from biological data, using intelligent data mining techniques implemented in a computer. In particular, this project will use one kind of intelligent data mining technique called artificial immune systems , which are essentially computer programs that work in a way inspired by the natural immune system. The latter is actually a very sophisticated system, evolved by nature, that allows our body to identify and fight a number of pathogens and invaders. It turns out that the natural immune system is very clever in recognising a very large number of harmful body invaders and developing an appropriate immune response for each kind of invader. The immune system exihibits many interesting properties such as learning, adaptation, and memory of invaders recognised in the past (which speeds up the immune response when the same invader is encountered again). The challenge is to identify which of the many properties of the natural immune system are suitable as an inspiration to design an intelligent artificial immune system for the problem of mining protein data. In order to address this challenge, this project will involve collaboration between computer scientists and biologists. The project will develop a computational model (a kind of computer simulation ) of some properties of the natural immune system, which will allows us to better understand that complex system. This understanding will be used to develop a novel data mining computer program inspired by the natural immune system. These two developments - the computational model and the data mining program - will be done in parallel and with a lot of feedback and interaction between the corresponding research teams, leading to novel contributions to both natural immunology and computer science.

Publications

10 25 50
publication icon
Davies M (2008) Alignment-Independent Techniques for Protein Classification in Current Proteomics

publication icon
Davies MN (2008) Optimizing amino acid groupings for GPCR classification. in Bioinformatics (Oxford, England)

publication icon
Davies MN (2007) On the hierarchical classification of G protein-coupled receptors. in Bioinformatics (Oxford, England)

publication icon
Secker A (2008) Artificial Immune Systems

 
Description A dataset of GPCR (G-protein-coupled receptor) proteins was created specifically for data mining purposes - more precisely, for the hierarchical prediction of GPCR functional classes. New computational data mining methods for hierarchical classification were then developed and applied to that dataset, and their results were compared with other methods proposed in the literature. The computational experiments validated the effectiveness of the proposed methods. In addition, after doing some modelling of the natural immune system, a new artificial immune system was developed for clustering (grouping) amino acids in a way that improves the classification of GPRCs.
Exploitation Route Although the project focused on the classificaiton of GPRCs, the new hierarchical classification methods developed are generic enough to be applicable to potentially any other hierarchical classification dataset - i.e. any dataset prepared specificallly for the hierarchical classification task of data mining - in any other application domain. The proposed computational data mining methods and results of the computational experiments were published as peer-reviewed papers in several academic journals, so that the project results can be potentially useful for other researchers.
Sectors Digital/Communication/Information Technologies (including Software)