Classifiers Ensembles for Changing Environments

Lead Research Organisation: Bangor University
Department Name: Sch of Electronics

Abstract

Pattern Recognition deals with assigning class labels to objects. An object could be anything, e.g. a patient (to be diagnosed), a bank loan application (to be approved or rejected), a pronounced vowel (to be matched to one of a set of vowels), an e-mail message (to be marked as legitimate or spam), etc. In many domains, class descriptions change with time. Take for example the classification of e-mail messages into spam and legitimate. Spam e-mail was more easily recognisable in the past because of specific common words being used within. Now that e-mail filters can scan for such words and spot the spam easily, the spammers invented different ways to trick the filters, therefore the description of class spam has changed. User preferences also change with time. For example, suppose that the user subscribes to a newsletter announcing new software every month. Then software ads, considered junk mail in the past, will be now welcome, again changing the description of class spam. These changing environments are called in Machine Learning concept drift . A classifier is a function, algorithm or a device that can assign labels to objects. Classifier models have been developed for coping with concept drift. Recently, classifier ensembles have been employed for this task as well. Classifier ensembles (called also committees of classifiers) operate as a panel of experts. When the object comes to be labelled, each member of the ensemble assigns a class label to the object. The overall decision of the panel is derived on the basis of the individual opinions. One possible way for this is to take the majority vote. Classifier ensembles have been found to be more accurate than single classifiers for traditional (=static) classification problems. The hope is that they will be also better than single classifiers in the presence of concept drift. Current literature on classifier ensembles on changing environments can be streamlined into three strategies: (1) dynamic combiners (where only the way the individual opinions are combined changes with time), (2) updating of the ensemble members (assuming that each member of the ensemble is capable of dealing with concept drift individually, the way of combining the opinions may stay fixed; the changes will be taken care of by each member); (3) structural changes in the ensemble (here we may want to replace regularly the outdated members of the ensembles or the ones whose expertise is no longer adequate for the task). There are classifier ensembles in each of these categories proposed in the literature. We are planning to study these methods and propose new ones which will benefit from merging the three strategies. In order to compare the new methods with the most successful existing methods, we need a collection of data sets with different concept drifts. As there is no such benchmark collection (and an excellent collection is available for static environments, maintained at the University of California, Irvine, UCI), one of the objectives of this project is to build a suite of Matlab programmes which will generate data with various simulated types of concept drift. We will look into modifying benchmark data sets from UCI to simulate concept drift. We will also look for real data sets with true problem of concept drift, e.g., from robotics or tracking. Having put together a data set collection, we will code in Matlab a selected set of existing classification methods, both individual classifier and classifier ensembles for changing environment. The code will be provided for free download on the Internet and will be suitable for use as a Matlab toolbox. To design out toolbox we will draw upon the Matlab toolbox for pattern recognition, PRTools, developed at Delft University of Technology, The Netherlands. Apart from being included in the toolbox, the new methods will be reported in a publication submission.

Publications

10 25 50