Using Grid technology for commercial scale content based image retrieval

Lead Research Organisation: University of Cambridge
Department Name: Physics

Abstract

Professor Parker's group holds a tremendous amount of expertise on developing and running Grid computing resources. Imense Ltd needs to be able to process hundreds of millions of images in a reliable, scaleable and efficient manner. Imense Ltd has already identified Grid computing as an important enabling technology for its future image search products but does not however have the resources or the time to adopt it without external support. Imense Ltd would like to deploy improved versions of gLite and Ganga on its own infrastructure (we already have a modest 120 CPU linux cluster) with the possibility to expand its Grid using virtual servers from third party providers. This will form the processing backbone for our goal of becoming the dominant image search portal on the web. Scalability is key as there are now numerous photo-hosting websites each holding hundreds of millions of images (www.flickr.com holds over 400m images as do www.webshots.com and www.photobucket.com). We would also like to be in the position of using whatever is the 'standard' for enterprise Grid management. The proposed knowledge to be transferred is: 1) How to set up and manage a scaleable Grid of computers. 2) How to overcome data bottlenecks. 3) How to interact with other Grids 4) What uses can the upcoming field of virtualisation be put to in grid computing. Imense Ltd has collaborated with Professor Parker and Dr Karl Harrison in a previous miniPIPPS project (PP/E5089581/1) and found them to be ideal partners. They demonstrated an impressive ability to identify and vercome (or circumvent) any obstacle to the use of Grid computing for large scale image indexing tasks. Through the previous collaboration we were able to extend the size of our processed test corpus from roughly 50,000 images to 3,000,000 images. We feel confident that a further collaboration would allow Imense Ltd to extend this by a further two orders of magnitude which will make our image search solution commercially viable.

Publications

10 25 50