Variable neighborhood search for clustering and data mining

Lead Research Organisation: Brunel University London
Department Name: Information Systems Computing and Maths

Abstract

Data mining is relatively new area in Operations research and Computer sciences. Clustering is one of the most popular data mining techniques. It aims at solving the following very general problem: given a data set describing some objects, find if it has some structure, and if so, determine groups within it. More precisely, cluster analysis aims at finding subsets of the given data sets, called clusters, which are both homogenous and well separated. Homogeneity means that objects in the same cluster should resemble one another; separation means that objects indifferent clusters should differ one from the other. As homogeneity and separation can be made precise in many ways, there are many specific clustering problems, and even more exact or heuristic methods to solve them. Many criteria stress either homogeneity or separation. This is not the case for the sum-of-squares criterion as minimizing the sum if inter-cluster sum-of-squares is tantamount to minimizing the between clusters sum-of-squares. Therefore minimum sum-of-squares clustering is central to cluster analysis.Within this proposal we are planning to develop heuristic method for solving large-scale minimum sum-of-squares clustering problem. In addition, our method will have guaranteed performance since we develop method that in the same time provide solutions with both upper and lower bounds on the objective function. Such an approach is based on Primal-dual variable neighborhood search (PD VNS)metaheuristic (or framework for building heuristics). We first solve primal problem to get solution of good quality by using VNS and then we find corresponding (unfeasible) dual solution. Then we reduce nonlinear function in the dual space that measures unfeasibility. This problem is also solved by VNS. If the problem is not very large, we try to close the integrality gap by using branch-and-bound method in order to get exact solution.

Publications

10 25 50