Testing view-based and 3D models of human navigation and spatial perception
Lead Research Organisation:
University of Reading
Department Name: Sch of Psychology and Clinical Lang Sci
Abstract
The way that animals use visual information to move around and interact with objects involves a highly complex interaction between visual processing, neural representation and motor control. Understanding the mechanisms involved is of interest not only to neuroscientists but also to engineers who must solve similar problems when designing control systems for autonomous mobile robots and other visually guided devices.
Traditionally, neuroscientists have assumed that the representation delivered by the visual system and used by the motor system is something like a 3D model of the outside world, even if the reconstruction is a distorted version of reality. Recently, evidence against such a hypothesis has been mounting and an alternative type of theory has emerged. 'View-based' models propose that the brain stores and organises a large number of sensory contexts for potential actions. Instead of storing the 3D coordinates of objects, the brain creates a visual representation of a scene using 2D image parameters, such as widths or angles, and information about the way that these change as the observer moves. This project examines the human representation of three-dimensional scenes to help distinguish between these two opposing hypotheses.
To do this, we will use immersive virtual reality with freely-moving observers to test the predictions of the 3D reconstruction and 'view-based' models. Head-tracked virtual reality allows us to control the scene the observer sees and to track their movements accurately. Certain spatial abilities have been taken as evidence that the observer must create a 3D reconstruction of the scene in the brain. For example, people are able to view a scene, remember where objects are, walk to a new location and then point back to one of the objects they had seen originally even if it is no longer visible (i.e. people can update the visual direction of objects as they move). However, this capacity does not necessarily require that the brain generate a 3D model of the scene and, as evidence, we will extend view-based models to include this pointing task and others like it. We will then test the predictions of both view-based and 3D reconstruction models against the performance of human participants carrying out the same tasks.
As well as predicting the pattern of errors in simple navigation and pointing tasks, we will also measure the effect of two types of stimulus change. 3D reconstruction uses 'corresponding points' which are points in an image that arise, for example, from the same physical object (or part of an object) as a camera or person moves around it. Using a novel stimulus, we will keep all of these 'corresponding points' in a scene constant yet, at the same time, changing the scene so that the images alter radically when the observer moves. This manipulation should have a dramatic effect on a view-based scheme but no effect at all on any system based only on corresponding points.
Overall, we will have a tight coupling between experimental observations and quantitative predictions of performance under two types of model. This will allow us to determine which of the two models most accurately reflects human behaviour in a 3D environment. One potential outcome of the project is that view-based models will provide a convincing account of performance in tasks that have previously been considered to require 3D reconstruction, opening up the possibility that a wide range of tasks can be explained within a view-based framework.
Traditionally, neuroscientists have assumed that the representation delivered by the visual system and used by the motor system is something like a 3D model of the outside world, even if the reconstruction is a distorted version of reality. Recently, evidence against such a hypothesis has been mounting and an alternative type of theory has emerged. 'View-based' models propose that the brain stores and organises a large number of sensory contexts for potential actions. Instead of storing the 3D coordinates of objects, the brain creates a visual representation of a scene using 2D image parameters, such as widths or angles, and information about the way that these change as the observer moves. This project examines the human representation of three-dimensional scenes to help distinguish between these two opposing hypotheses.
To do this, we will use immersive virtual reality with freely-moving observers to test the predictions of the 3D reconstruction and 'view-based' models. Head-tracked virtual reality allows us to control the scene the observer sees and to track their movements accurately. Certain spatial abilities have been taken as evidence that the observer must create a 3D reconstruction of the scene in the brain. For example, people are able to view a scene, remember where objects are, walk to a new location and then point back to one of the objects they had seen originally even if it is no longer visible (i.e. people can update the visual direction of objects as they move). However, this capacity does not necessarily require that the brain generate a 3D model of the scene and, as evidence, we will extend view-based models to include this pointing task and others like it. We will then test the predictions of both view-based and 3D reconstruction models against the performance of human participants carrying out the same tasks.
As well as predicting the pattern of errors in simple navigation and pointing tasks, we will also measure the effect of two types of stimulus change. 3D reconstruction uses 'corresponding points' which are points in an image that arise, for example, from the same physical object (or part of an object) as a camera or person moves around it. Using a novel stimulus, we will keep all of these 'corresponding points' in a scene constant yet, at the same time, changing the scene so that the images alter radically when the observer moves. This manipulation should have a dramatic effect on a view-based scheme but no effect at all on any system based only on corresponding points.
Overall, we will have a tight coupling between experimental observations and quantitative predictions of performance under two types of model. This will allow us to determine which of the two models most accurately reflects human behaviour in a 3D environment. One potential outcome of the project is that view-based models will provide a convincing account of performance in tasks that have previously been considered to require 3D reconstruction, opening up the possibility that a wide range of tasks can be explained within a view-based framework.
Planned Impact
Our experiments aim to deliver a more accurate model of human spatial representation and navigation behaviour than exists at present. There are clear industrial applications for this type of knowledge in several distinct areas. We have two existing collaborations that allow us to have a direct impact. First, we have a long-standing relationship with Microsoft Research Cambridge, who fund a current PhD student with us (since October 2011). Andrew Fitzgibbon, who co-supervises the project, and others at Microsoft (John Winn, Antonio Criminisi) are interested in algorithms that use non-Cartesian, view-based representations for applications that have traditionally relied on 3D metric models.
Second, we have a collaboration with the car manufacturer, Renault. They perform much of their car-interior prototyping in virtual reality, and are keenly interested in perception data coming from our lab in order to determine which types of scene manipulation will have a noticeable perceptual effect and which will not. They also have a strong interest in the calibration methods and high-quality virtual reality that we have available in our laboratory. Renault will fund a new PhD student in our lab starting in 2012.
Our laboratory is involved in a range of out-reach activities, including open days for the public and for school children from the Sutton Trust. Dr Glennerster has advertised the work of the lab giving plenary and other talks at 3DTV conferences, where producers and technologists alike are interested in problems with the way that 3DTV and 3D cinema is interpreted and perceived. Dr Glennerster has given public lectures (e.g. Royal College of Surgeons) and our laboratory has engaged the public in a demonstration at the Royal Society of the mutations involved in the potassium channel affected in neonatal diabetes: children could fly through a model of their own channel as it opened and closed and was 'mended' by the drug that cured their condition.
Second, we have a collaboration with the car manufacturer, Renault. They perform much of their car-interior prototyping in virtual reality, and are keenly interested in perception data coming from our lab in order to determine which types of scene manipulation will have a noticeable perceptual effect and which will not. They also have a strong interest in the calibration methods and high-quality virtual reality that we have available in our laboratory. Renault will fund a new PhD student in our lab starting in 2012.
Our laboratory is involved in a range of out-reach activities, including open days for the public and for school children from the Sutton Trust. Dr Glennerster has advertised the work of the lab giving plenary and other talks at 3DTV conferences, where producers and technologists alike are interested in problems with the way that 3DTV and 3D cinema is interpreted and perceived. Dr Glennerster has given public lectures (e.g. Royal College of Surgeons) and our laboratory has engaged the public in a demonstration at the Royal Society of the mutations involved in the potassium channel affected in neonatal diabetes: children could fly through a model of their own channel as it opened and closed and was 'mended' by the drug that cured their condition.
People |
ORCID iD |
Andrew Glennerster (Principal Investigator) |
Publications
Glennerster Andrew
(2018)
A single coordinate framework for optic flow and binocular disparity
in arXiv e-prints
Glennerster A
(2015)
Visual stability-what is the problem?
in Frontiers in psychology
Gootjes-Dreesbach L
(2017)
Comparison of view-based and reconstruction-based models of human navigational strategy.
in Journal of vision
Muryy A
(2017)
Navigation and pointing errors in non-metric environments.
in Journal of Vision
Scarfe P
(2015)
Using high-fidelity virtual reality to study perception in freely moving observers.
in Journal of vision
Scarfe P
(2021)
Combining cues to judge distance and direction in an immersive virtual reality environment.
in Journal of vision
Glennerster A
(2016)
A moving observer in a three-dimensional world.
in Philosophical transactions of the Royal Society of London. Series B, Biological sciences
Muryy A
(2021)
Route selection in non-Euclidean virtual environments.
in PloS one
Vuong J
(2019)
No single, stable 3D representation can explain pointing biases in a spatial updating task.
in Scientific reports
Scarfe P
(2014)
Humans use predictive kinematic models to calibrate visual cues to three-dimensional surface slant.
in The Journal of neuroscience : the official journal of the Society for Neuroscience
Description | One paper published from this grant shows that sensory adaptation (in this case, people re-calibrate their sense of slant) depends on how they interpret physical interactions of objects (in this case a ball bouncing differently when it is spinning). This demonstrates that 'low-level cues' such as the range of slants people see, are not the only cause of adaptive changes in the visual system. We have also published a review paper on how to set up a high-fidelity virtual reality lab (at one point this was the most downloaded paper in Journal of Vision) and a theoretical paper on the problem of visual stability. A major review paper on the problem of spatial representation in a moving observer was published in Phil Trans B. This sets out a radical alternative to theories that suppose the brain builds 3D 'maps' or reconstructions of the scene. Another paper comparing view-based and reconstruction models as predictors of human navigation in a homing task is now published in Journal of Vision. Another paper has been published in LNCS (and another is on BioRXiv and under review in PLoS ONE) describing the errors that people make when they point at remembered objects while walking in a maze and when they try to take shortcuts. A paper on spatial updating of objects when an observer moves is published in Scientific Reports. This discusses the importance of Generative Query Networks as potential models for human navigation and spatial representation. |
Exploitation Route | We are collaborating with engineers who are interested in adaptive control systems. These results have implications for the design of such systems. We are now developing a more extensive collaboration with Professor Phil Torr's group the Department of Engineering in Oxford and with Professor Abhinav Gupta's lab in the Robotics Group at Carnegie Mellon University. The VR lab results will be used to inform and adapt machine learning techniques for learning spatial layout and will inform ideas about spatial representation in humans. Navigation systems in autonomous vehicles may in future use representations that are more like those that have evolved in animals. Understanding these may therefore have important economic consequences. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | http://www.personal.reading.ac.uk/~sxs05ag/ |
Description | The action-based brain: a provocation to philosophy, robotics and the cognitive sciences |
Amount | £30,982 (GBP) |
Funding ID | AH/N006011/1 |
Organisation | Arts & Humanities Research Council (AHRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2016 |
End | 02/2017 |
Description | Understanding Scenes and Events through Joint Parsing, Cognitive Reasoning and Lifelong Learning |
Amount | £443,434 (GBP) |
Funding ID | EP/N019423/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 02/2016 |
End | 01/2019 |
Description | Collaboration with Microsoft Research, Cambridge |
Organisation | Microsoft Research |
Department | Microsoft Research Cambridge |
Country | United Kingdom |
Sector | Private |
PI Contribution | co-supervising PhD student |
Collaborator Contribution | co-supervising PhD student, trips to Reading, Cambridge and plans to hold a conference July 1-3rd 2015 at Microsoft Research Cambridge. |
Impact | Yes, this is multi-disciplinary (computer vision and human psychophysics). Scientific Reports publication (2019). |
Start Year | 2011 |
Description | Collaboration with Phil Torr's group in Robotics, University of Oxford |
Organisation | University of Oxford |
Department | Department of Engineering Science |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We have begun a collaboration that will be extended as part of EPSRC grant EP/N019423/1. We will provide access to the Virtual Reality lab in Reading and psychophysical expertise. The aim is to compare human performance on navigation tasks with that of reinforcement learning techniques trained on games that require navigation to obtain rewards. We are currently writing a grant together to submit to EPSRC to continue this collaboration. |
Collaborator Contribution | The Torr group will carry out the modelling described above. |
Impact | Multidisciplinary: neuroscience and computer vision/machine learning. |
Start Year | 2016 |
Description | Journal of Neuroscience press release |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Media (as a channel to the public) |
Results and Impact | We entered into discussion with a BBC journalist about covering our lab's work on VR. The journalist has said he will follow this up with a program covering new developments in VR, particularly following the $2billion investment by Facebook in Oculus Rift |
Year(s) Of Engagement Activity | 2014 |
URL | http://peterscarfe.com/bounceRecalibration.html |
Description | Microsoft Research Cambridge conference |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | 15 academics from USA, Europe and UK and members of Microsoft Research (MSR) Cambridge met at MSR to discuss 'view-based' approaches to spatial representation. This led to very fruitful exchange of ideas between the computer vision and neuroscience communities and should result in two publications. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.glennersterlab.com/MSRMeeting2015/index.html |
Description | cBBC coverage |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Media (as a channel to the public) |
Results and Impact | cBBC approached us to ask about virtual reality. Our lab comes up readily on searches for virtual reality. We participated in a program about the future of VR. It included VR from the hap tics group in Systems Engineering at Reading with whom we collaborate. Journalists at the BBC say they will contact us again in relation to similar topics. |
Year(s) Of Engagement Activity | 2014 |
URL | http://bbc.in/1nJ4f1N |