Testing view-based and 3D models of human navigation and spatial perception

Lead Research Organisation: University of Reading
Department Name: Sch of Psychology and Clinical Lang Sci

Abstract

The way that animals use visual information to move around and interact with objects involves a highly complex interaction between visual processing, neural representation and motor control. Understanding the mechanisms involved is of interest not only to neuroscientists but also to engineers who must solve similar problems when designing control systems for autonomous mobile robots and other visually guided devices.

Traditionally, neuroscientists have assumed that the representation delivered by the visual system and used by the motor system is something like a 3D model of the outside world, even if the reconstruction is a distorted version of reality. Recently, evidence against such a hypothesis has been mounting and an alternative type of theory has emerged. 'View-based' models propose that the brain stores and organises a large number of sensory contexts for potential actions. Instead of storing the 3D coordinates of objects, the brain creates a visual representation of a scene using 2D image parameters, such as widths or angles, and information about the way that these change as the observer moves. This project examines the human representation of three-dimensional scenes to help distinguish between these two opposing hypotheses.

To do this, we will use immersive virtual reality with freely-moving observers to test the predictions of the 3D reconstruction and 'view-based' models. Head-tracked virtual reality allows us to control the scene the observer sees and to track their movements accurately. Certain spatial abilities have been taken as evidence that the observer must create a 3D reconstruction of the scene in the brain. For example, people are able to view a scene, remember where objects are, walk to a new location and then point back to one of the objects they had seen originally even if it is no longer visible (i.e. people can update the visual direction of objects as they move). However, this capacity does not necessarily require that the brain generate a 3D model of the scene and, as evidence, we will extend view-based models to include this pointing task and others like it. We will then test the predictions of both view-based and 3D reconstruction models against the performance of human participants carrying out the same tasks.

As well as predicting the pattern of errors in simple navigation and pointing tasks, we will also measure the effect of two types of stimulus change. 3D reconstruction uses 'corresponding points' which are points in an image that arise, for example, from the same physical object (or part of an object) as a camera or person moves around it. Using a novel stimulus, we will keep all of these 'corresponding points' in a scene constant yet, at the same time, changing the scene so that the images alter radically when the observer moves. This manipulation should have a dramatic effect on a view-based scheme but no effect at all on any system based only on corresponding points.

Overall, we will have a tight coupling between experimental observations and quantitative predictions of performance under two types of model. This will allow us to determine which of the two models most accurately reflects human behaviour in a 3D environment. One potential outcome of the project is that view-based models will provide a convincing account of performance in tasks that have previously been considered to require 3D reconstruction, opening up the possibility that a wide range of tasks can be explained within a view-based framework.

Planned Impact

Our experiments aim to deliver a more accurate model of human spatial representation and navigation behaviour than exists at present. There are clear industrial applications for this type of knowledge in several distinct areas. We have two existing collaborations that allow us to have a direct impact. First, we have a long-standing relationship with Microsoft Research Cambridge, who fund a current PhD student with us (since October 2011). Andrew Fitzgibbon, who co-supervises the project, and others at Microsoft (John Winn, Antonio Criminisi) are interested in algorithms that use non-Cartesian, view-based representations for applications that have traditionally relied on 3D metric models.

Second, we have a collaboration with the car manufacturer, Renault. They perform much of their car-interior prototyping in virtual reality, and are keenly interested in perception data coming from our lab in order to determine which types of scene manipulation will have a noticeable perceptual effect and which will not. They also have a strong interest in the calibration methods and high-quality virtual reality that we have available in our laboratory. Renault will fund a new PhD student in our lab starting in 2012.

Our laboratory is involved in a range of out-reach activities, including open days for the public and for school children from the Sutton Trust. Dr Glennerster has advertised the work of the lab giving plenary and other talks at 3DTV conferences, where producers and technologists alike are interested in problems with the way that 3DTV and 3D cinema is interpreted and perceived. Dr Glennerster has given public lectures (e.g. Royal College of Surgeons) and our laboratory has engaged the public in a demonstration at the Royal Society of the mutations involved in the potassium channel affected in neonatal diabetes: children could fly through a model of their own channel as it opened and closed and was 'mended' by the drug that cured their condition.

Publications

10 25 50
 
Description One paper published from this grant shows that sensory adaptation (in this case, people re-calibrate their sense of slant) depends on how they interpret physical interactions of objects (in this case a ball bouncing differently when it is spinning). This demonstrates that 'low-level cues' such as the range of slants people see, are not the only cause of adaptive changes in the visual system. We have also published a review paper on how to set up a high-fidelity virtual reality lab (at one point this was the most downloaded paper in Journal of Vision) and a theoretical paper on the problem of visual stability. A major review paper on the problem of spatial representation in a moving observer was published in Phil Trans B. This sets out a radical alternative to theories that suppose the brain builds 3D 'maps' or reconstructions of the scene. Another paper comparing view-based and reconstruction models as predictors of human navigation in a homing task is now published in Journal of Vision. Another paper has been published in LNCS (and another is on BioRXiv and under review in PLoS ONE) describing the errors that people make when they point at remembered objects while walking in a maze and when they try to take shortcuts. A paper on spatial updating of objects when an observer moves is published in Scientific Reports. This discusses the importance of Generative Query Networks as potential models for human navigation and spatial representation.
Exploitation Route We are collaborating with engineers who are interested in adaptive control systems. These results have implications for the design of such systems. We are now developing a more extensive collaboration with Professor Phil Torr's group the Department of Engineering in Oxford and with Professor Abhinav Gupta's lab in the Robotics Group at Carnegie Mellon University. The VR lab results will be used to inform and adapt machine learning techniques for learning spatial layout and will inform ideas about spatial representation in humans. Navigation systems in autonomous vehicles may in future use representations that are more like those that have evolved in animals. Understanding these may therefore have important economic consequences.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://www.personal.reading.ac.uk/~sxs05ag/
 
Description The action-based brain: a provocation to philosophy, robotics and the cognitive sciences
Amount £30,982 (GBP)
Funding ID AH/N006011/1 
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 03/2016 
End 02/2017
 
Description Understanding Scenes and Events through Joint Parsing, Cognitive Reasoning and Lifelong Learning
Amount £443,434 (GBP)
Funding ID EP/N019423/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 02/2016 
End 01/2019
 
Description Collaboration with Microsoft Research, Cambridge 
Organisation Microsoft Research
Department Microsoft Research Cambridge
Country United Kingdom 
Sector Private 
PI Contribution co-supervising PhD student
Collaborator Contribution co-supervising PhD student, trips to Reading, Cambridge and plans to hold a conference July 1-3rd 2015 at Microsoft Research Cambridge.
Impact Yes, this is multi-disciplinary (computer vision and human psychophysics). Scientific Reports publication (2019).
Start Year 2011
 
Description Collaboration with Phil Torr's group in Robotics, University of Oxford 
Organisation University of Oxford
Department Department of Engineering Science
Country United Kingdom 
Sector Academic/University 
PI Contribution We have begun a collaboration that will be extended as part of EPSRC grant EP/N019423/1. We will provide access to the Virtual Reality lab in Reading and psychophysical expertise. The aim is to compare human performance on navigation tasks with that of reinforcement learning techniques trained on games that require navigation to obtain rewards. We are currently writing a grant together to submit to EPSRC to continue this collaboration.
Collaborator Contribution The Torr group will carry out the modelling described above.
Impact Multidisciplinary: neuroscience and computer vision/machine learning.
Start Year 2016
 
Description Journal of Neuroscience press release 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact We entered into discussion with a BBC journalist about covering our lab's work on VR.

The journalist has said he will follow this up with a program covering new developments in VR, particularly following the $2billion investment by Facebook in Oculus Rift
Year(s) Of Engagement Activity 2014
URL http://peterscarfe.com/bounceRecalibration.html
 
Description Microsoft Research Cambridge conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact 15 academics from USA, Europe and UK and members of Microsoft Research (MSR) Cambridge met at MSR to discuss 'view-based' approaches to spatial representation. This led to very fruitful exchange of ideas between the computer vision and neuroscience communities and should result in two publications.
Year(s) Of Engagement Activity 2015
URL http://www.glennersterlab.com/MSRMeeting2015/index.html
 
Description cBBC coverage 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact cBBC approached us to ask about virtual reality. Our lab comes up readily on searches for virtual reality. We participated in a program about the future of VR. It included VR from the hap tics group in Systems Engineering at Reading with whom we collaborate.

Journalists at the BBC say they will contact us again in relation to similar topics.
Year(s) Of Engagement Activity 2014
URL http://bbc.in/1nJ4f1N