FISH: Fast Semantic Nearest Neighbour Search

Lead Research Organisation: University of Nottingham
Department Name: School of Computer Science

Abstract

This proposal addresses Theme 1 of the Data Intensive Systems (DaISy) Call "Extracting meaningful information: Deriving meaning from large, heterogeneous, incomplete, contradictory, noisy and dispersed data sets with many different forms and formats (e.g. text, image and sound files)". The project will research and develop novel fast semantic nearest neighbour search (FISH) data structure, algorithm and software utilities for rapidly discovering semantically similar neighbours of complex data objects and automatically assigning semantic class labels to them. It will use large scale, high dimensional, heterogeneous, incomplete, and noisy internet image labelling datasets as Case Study for technology development and evaluation. The FISH solutions and utilities can be directly applied to automatically label security surveillance videos and defence reconnaissance imageries to extract semantic meanings to facilitate security and defence intelligence gathering and analysis.

Planned Impact

Defence reconnaissance systems such as the RAF's RAPTOR contain imaging sensors (visible and infrared) which can acquire huge volumes of high resolution imageries of ground targets. Rapidly and accurately interpreting the images, identifying and recognizing ground objects and assessing the battle-field situations will be very important. Post-battle analysis of these imageries will be also valuable for future operations. Manually analysing these large volumes of high resolution imageries can be both difficult and labour intensive. Automatic image analysis and interpretation will play a very important role in making good use of defence reconnaissance data such as those acquired by the RAF's RAPTOR. The research results of this project can be useful for these purposes.

The FISH algorithms and solutions can be applied to automatic labelling of the images and videos returned by RAPTOR. There are at least two scenarios where FISH can be useful. It can be implemented onboard the aircraft interpreting and labelling the images in real-time thus helping the pilots assessing the battle-field situations and react to them more rapidly. It can also be implemented in ground stations for post battle analysis and for archiving - by automatically labelling the images of different situations can facilitate search and retrieval of specific and relevant situations.

There are an estimated 1.85 million surveillance video cameras in the UK constantly collecting billions of image footages which could contain vital security information for antiterrorism, for preventing crime and for protecting human lives. For such huge volumes of data, it will be impossible to interpret them manually. Automatic tools will be valuable and FISH can help develop such tools.

The FISH algorithms and solutions can be used to automatically label security surveillance videos, identify dangerous and emergency situations quickly thus helping security authorities to gather security intelligence and react to them fast. Such techniques can be employed by the police and other security authorities for real-time situation monitoring or for post-event analysis - semantically labelled video frames can greatly facilitate the retrieval of specific and relevant contents.

For defence, the FISH technology can help extract meanings from imageries acquired by reconnaissance systems such as RAF's RAPTOR thus facilitating defence intelligence gathering. For security, the FISH technology can help extract meanings from security videos thus helping gathering security information which may be valuable for antiterrorism, preventing crime and saving lives.

In summary, the research of the FISH project can contribute state of the art technology for improving defence and security intelligence gathering and analysis which will lead to improvement in the UK's defence and security capability.

Beyond defence and security, the research of the FISH project will have wider applications in academic research, and in applications fields ranging from biomedical to multimedia and the internet.

In biomedical research, e.g., disease classification in medical images and general biomedical data classification and analysis such as matching DNA sequences involve finding similar data objects, the FISH technology can have a direct applications in these areas.
 
Description Nearest neighbour search is fundamental to data analysis. In the "big data" era, advanced data analysis technologies and tools such as nearest neighbour search are fundamentally important. In this project we have made significant advances through developing the Fast Semantic Nearest Neighbour Search (FISH) technologies, in particular we have developed the following new technologies and applications

(1) A random forest based fast semantic nearest neighbour search method that tackles two fundamental drawbacks of traditional methods: scalability and semantic gap. The new method is based on tree data structure which is intrinsically fast and scalable. The new method finds neighbours that are not only close in the feature space but also have similar semantic meanings overcoming traditional methods that can only find neighbours that are close in the feature space but have very different semantic meanings. This is a new generic nearest neighbour search technique that overcomes the two fundamental weaknesses of traditional methods.

(2) The new FISH method has been successfully applied to automatically label images and videos with what they contains. For example, for an image of a harbor taken during sunset, our new algorithm can label it with the meaningful words such as "sunset", "water", "boat", etc. This is very useful for managing large image databases and for search relevant images on the Internet.

(3) New methods based on FISH for automatically classify habitants. This is the first time that image-labeling technology has been applied to automatically label ground photographs with the habitats they contain. This has open up the new possibility of using computer algorithms to replace traditional labour intensive approach to habitat classification.
Exploitation Route The FISH technologies can be programmed into a data analysis tool for finding semantic nearest neighbours in generic data analysis tasks.

The FISH technologies can be programmed into image analysis tools for automatically labeling images and videos with what they contain, and for searching semantically similar images from the Internet or large image repositories.

The FISH technologies can be programmed to build automatic habitat classification tools. Software tools based on FISH can take GPS-referenced ground photographs as input and automatically label them with the habits they contain, thus automatically label the habitats of geographical locations.
Sectors Agriculture, Food and Drink,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Environment,Financial Services, and Management Consultancy,Culture, Heritage, Museums and Collections,Security and Diplomacy,Transport

 
Description The automatics image labeling method developed through this grant has found new applications in automatic habitat classification. Researchers from the British Ordinance Survey have shown great interest in applying the technologies to automatically categories habitats based on GPS-referenced ground photographs. Using data supplied by Ordinance Survey, we have successfully demonstrated the application of the Fast Semantic Nearest Neighbour Search (FISH) technologies developed in this grant to automatically classify habitats (Ecological Informatics, vol. 23, pp. 126-136, 2014). Habitat classification is important for monitoring the environment and biodiversity. Traditionally, this is done manually by human surveyors, a laborious, expensive and subjective process. Automatic habitat classification has the advantage and potential of improving efficiency, reducing cost, and removing subjectivity. Therefore, the grant has the potential impact of assisting environment protection as well as improving efficiency thus producing positive economy and societal impacts.
First Year Of Impact 2013
Sector Digital/Communication/Information Technologies (including Software),Environment
Impact Types Societal,Economic

 
Description DSTL 
Organisation Defence Science & Technology Laboratory (DSTL)
Country United Kingdom 
Sector Public 
PI Contribution New automatic image labeling methods for assigning semantic meanings (labels) to visual images
Collaborator Contribution Intellectual input, potential applications of the new technologies developed in the project
Impact (1) Random forest for image annotation (2) DOI: 10.1016/j.ecoinf.2013.08.002 (3) doi>10.1145/2393347.2396344
Start Year 2012