Exploiting spatial cognition in picture database design

Lead Research Organisation: University of Nottingham
Department Name: Sch of Psychology

Abstract

The commercial and functional potential of picture databases is very great for a range of applications from medicine to medieval history. However, this potential is proving difficult to deliver and research activity in this area is intense. There are two principal approaches: i) extending existing methods to encode pictures by description and keywords; and ii) computational analysis of images to capture superficial aspects such as colour and texture; aiming to remove the effort of entering pictures into a database and to allow user's crude depictions to access subsets of pictures to be searched for recognition. However, current opinion suggests neither approach is likely in the medium term to deliver cost-effective solutions to the problem except in highly specialised areas.Our research proposes to innovate by considering the problem from a third, psychological, perspective: human spatial cognition is robust, and we are generally good at inspecting pictures and recalling their spatial layout later. Furthermore, the layout of most images can be described in ways that preserve elements of meaning and visual distinctiveness. Ideally therefore, databases that encode location information in pictures, and allow users to use that information in retrieval, represent a match of human skills with a method generally applicable to most task domains. To this end, this proposal links two lines of psychological research. First, we are interested in visual attention: how do people look at pictures? For the purposes of database design, we are interested in the relationship between picture content and attention; as expressed in eye movements. Although eye movements are variable, they do show elements of consistency. We will be concerned with how best to represent and evaluate this consistency as a function of factors such as: the picture content; different observers; task domain; and delay between storage and retrieval.Second, we aim to study how the spatial layout of images is remembered as a consequence of attention. Can we use our understanding of visual attention processes (and eye movements in particular) to predict spatial recall? How precise is this spatial knowledge, how could it be used, and how discriminating is it in the retrieval of images from a database? There are two issues here: (i) We know that some location knowledge is acquired very quickly in the inspection processes. This is also the stage when the viewer's eye movements are more predictable by computer because they are driven by visual analysis of the image and less upon its meaning. It follows that if we can model the relationship between early eye movements and location memory, and if that memory is useful in retrieval, then some indexing of pictures into databases can be automated. This research aims to evaluate this potential; (ii), As inspection continues, eye movements become harder to predict as the viewer's understanding of the content of the image develops. We aim to show how this meaning influences eye movements and the impact of this upon location memory beyond that gained in the early stages of viewing. Overall, these two complementary questions will tell us how much picture coding can be automated and how task- and user- specific factors will influence design.As a study of the feasibility of this innovation to the design of picture databases, this proposal also considers the adaptability and efficiency of the approach in different circumstances. Accordingly, in evaluating the cost benefits to picture databases, the project will seek to measure the contribution of: domain expertise, training, and some interface design issues. This will indicate whether the approach has general applicability in picture databases or whether it is best applied to bespoke, specialist, systems where training and expertise is required.

Publications

10 25 50