Dimensionality Reduction for Efficient Similarity Search in High Dimensional Spaces

Lead Research Organisation: Open University
Department Name: Knowledge Media Institute


This overseas travel grant proposal requests EPSRC's support for a two-month visit (January-February 2007) to my research collaborators Prof. Xiaofang Zhou and Dr. Heng Tao Shen at School of Information Technology and Electrical Engineering (ITEE), the University of Queensland, and Prof. Peter Bruza at School of Information Systems, Queensland University of Technology, both in Brisbane, Australia, to conduct joint research in dimensionality reduction for efficient similarity search in high dimensional spaces. There has been a pressing need of dealing with large scale multimedia and complex data to find required information efficiently and effectively. Data objects are represented by automatically extracted features (e.g., colour, texture, and geometry properties for images) as feature vectors, which are points in high dimensional space. Similarity query processing is to find the data objects similar to a query, i.e., the nearest K neighboring (K-NN) points of the query object in the high dimensional space, typically by measuring distances between the points. The similarity search is a common problem fundamental to a wide range of applications, such as multimedia search, molecular biology, digital libraries, medical imaging, video surveillance, and so on. Major challenges in this area are rooted on the curse of dimensionality , which is still an open problem remaining largely unsolved. The K-NN problem in a high dimensional space becomes meaningless with the increase of the number of dimensions, as the difference between the distances for a point in the space to its nearest neighbor and its farthest neighbor approaches to zero when the number of dimensions approaching infinity. Dimensionality reduction aims to reduce the impact of curse of dimensionality by exploring effective dimensionality reduction techniques. In the literature, a number of dimensionality reduction methods have been investigated. However, the suitability of an existing algorithm will largely depend on and have to be adapted to the type of data, similarity measure and query processing strategy. In this research, we will conduct a systematic investigation into how to adapt state-of-the-art linear and non-linear dimensionality reduction techniques to these aspects. We aim at significantly improving performance of similarity query processing. We have already done extensive grounding work related to this research. Some existing dimensionality reduction methods have been tested on large scale protein structure data. In addition, initial work in video sequence search is on-going. We have now reached the hard core of the research. A large amount of multimedia database aspects (such as indexing, query processing, etc.) that I am currently unfamiliar with, will get involved. As a consequence, intensive face-to-face interactions with my collaborators, who are top multimedia database researchers, are essential. It is important to meet and physically work together with them. This will help the research carried out at the Knowledge Media Institute of the Open University.This overseas travel grant will be highly cost-effective. The concrete outcomes will include papers, initial experimental results as proof-of-concept for an EPSRC proposal I am preparing, and a following-on regular student exchange program for my PhD students to visit the multimedia database group of the University of Queensland (fully supported by the host university). It will generate significant benefits to the UK in terms of knowledge transfer, training of high-quality personnel, and international collaboration. It will strengthen my existing link with these internationally recognized researchers. Through the joint research, I will learn advanced technologies on multimedia databases, which will broaden my vision and expertise as an information retrieval researcher. This will further enhance my department's and to my own research profile.