Understanding Scenes and Events through Joint Parsing, Cognitive Reasoning and Lifelong Learning

Lead Research Organisation: University of Reading
Department Name: Sch of Psychology and Clinical Lang Sci

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50
publication icon
Glennerster A (2016) A moving observer in a three-dimensional world. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

publication icon
Glennerster A (2023) Understanding 3D vision as a policy network. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

publication icon
Stefanou M (2020) A homing task that could not be done by image matching. in Journal of Vision

 
Description At a Royal Society meeting on 3D vision in November 2021, I presented our group's work in a session with computer vision experts from DeepMind (Eslami) and other, arguing that recent advances in reinforcement learning bring an understanding of biological 3D vision much closer. The paper corresponding to this talk has now been published in Phil Trans B (Glennerster, 2023). Also in November, I submitted a grant application to EPSRC to continue exploring the representations underlying human navigation and strengthening links with Prof Torr's group in Oxford who work on reinforcement learning. The aim was to try and explain the successes and failures of human navigation and 3D perception using reinforcement learning models but this has not been funded.

During the grant, we have developed links with Prof Gupta's group at Carnegie Mellon, Prof Torr's group at the University of Oxford and Prof Zhu's group in UCLA that looks likely to lead to new hypotheses about the type of representation people may use when navigating and carrying out other tasks in a 3D environment. We can then design experimental tests of these hypotheses in our VR lab. AG (PI) spent one month in UCLA in 2018 collaborating with Prof Song Chun Zhu. With Prof Torr's group we have published a paper in Vision Research (and arXiv). We are in discussion with Google DeepMind about modelling our human navigation data from the VR lab. We have published a paper in Scientific Reports with collaborators from Microsoft Research. This argues against the idea that the brain builds a 3D model of the scene and in favour of the idea that it is able to learn how images change when we move through the world and simulate this process (imperfectly) when objects are out of sight.
Exploitation Route We hope to start a widespread debate about the extent to which the brain reconstructs models of the outside world and the way in which findings from reinforcement learning in machine vision provide new hypotheses for representation in the brain (see above). AG has given various keynote talks, including to the British Machine Vision Association, the Royal Society and internationally. Twitter and now Mastodon are useful fora for discussion and dissemination both with the SLAM (robotics) and human navigation community. Our lab collaborates with Professor Torr's lab who are linked with a spin-out company FiveAI who recently received $41m for autonomous vehicle research. It is likely that in future autonomous vehicles will use representations that are more like those that animals use, hence understanding these could lead to a significant economic benefit.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare

URL https://research.reading.ac.uk/3d-vision/
 
Description Collaboration with Phil Torr's group in Robotics, University of Oxford 
Organisation University of Oxford
Department Department of Engineering Science
Country United Kingdom 
Sector Academic/University 
PI Contribution We have begun a collaboration that will be extended as part of EPSRC grant EP/N019423/1. We will provide access to the Virtual Reality lab in Reading and psychophysical expertise. The aim is to compare human performance on navigation tasks with that of reinforcement learning techniques trained on games that require navigation to obtain rewards. We are currently writing a grant together to submit to EPSRC to continue this collaboration.
Collaborator Contribution The Torr group will carry out the modelling described above.
Impact Multidisciplinary: neuroscience and computer vision/machine learning.
Start Year 2016
 
Description Collaboration with Prof Song-Chun Zhu, UCLA 
Organisation University of California, Los Angeles (UCLA)
Country United States 
Sector Academic/University 
PI Contribution I spent 1 month in UCLA (May-June 2018) working on a joint paper on compositionality in deep neural networks, and on a MURI (US Defense) grant application.
Collaborator Contribution I worked particular with Mark Edmonds in Prof Zhu's group. We have a draft paper comparing model-based and model-free approaches to representation in neural nets and the implications for neuroscience.
Impact Paper in preparation. Multidisciplinary (Neuroscience and Machine Learning).
Start Year 2018
 
Description Collaboration with Professor Abhinav Gupta's group, Carnegie Mellon University 
Organisation Carnegie Mellon University
Department Robotics Institute
Country United States 
Sector Academic/University 
PI Contribution I visited Gupta's group with Alex and Luise (PDRAs on this grant) to design psychophysical experiments that we will carry out in Reading as part of a joint project on 3D representation without 3D coordinates.
Collaborator Contribution Gupta's group will design reinforcement learning routines that will use the same Unity environments and task as human participants so see whether the networks learn in the same way. We will also explore whether apprentice learning can improve the reinforcement learning.
Impact Vision Research article 2020 (link above) came through this collaboration.
Start Year 2017