DEVA - Autonomous Driving with Emergent Visual Attention

Lead Research Organisation: UNIVERSITY OF EXETER

Department Name: Engineering Computer Science and Maths

Abstract

How does a racer drive around a track? Approaching a bend in the road, a driver needs to monitor the road, steer around curves, manage speed and plan a trajectory avoiding collisions with other cars - and all of this, fast and accurately. For robots this remains a challenge: despite progress in computer vision over the last decades, artificial vision systems remain far from human vision in performance, robustness and speed. As a consequence, current prototypes of self-driving cars rely on a wide variety of sensors to palliate the limitations of their visual perception. One crucial aspect that distinguishes human from artificial vision is our capacity to focus and shift our attention. This project will propose a new model of visual attention for a robot driver, and investigate how attention focusing can be learnt automatically by trying to improve the robot's driving.

How and where we focus our attention when solving a task such as driving is studied by psychologists, and the numerous models of attention can be sorted in two categories: first, top-down models capture how world knowledge and expectations guide our attention when performing a specific task; second, bottom-up models characterise how properties of the visual signal make specific regions capture our attention, a property often referred to as saliency. Yet, from a robotics perspective, there remains a lack of a unified framework describing the interplay of bottom-up and top-down attention, especially for a dynamic, time-critical task such as driving. In the racing scenario described above, the driver must take quick and decisive action to steer around bends and avoid obstacles - efficient use of attention is therefore critical.

This project will investigate the hypothesis that our attention mechanisms are learnt on a task specific basis, in a such a way as to provide our visual system optimal information for performing the task. We will investigate how state-of-the-art computer vision and machine learning approaches can be used to learn attention, perception and action jointly to allow a robot driver to compete with humans on a racing simulator, using visual perception only.

A generic learning framework for task-specific attention will be developed that is applicable across a broad range of visual tasks, and bears
the potential for reducing the gap with human performance by a critical reduction in current processing times.

Planned Impact

This project will have impact in three communities:
(1) Computer vision and robotics community
(2) Car safety and autonomous cars industry
(3) Psychologists in attention research

The computer vision and robotics community will benefit directly from the new knowledge and techniques developed during this project. By proposing a new approach to reduce the amount of visual data to be processed while solving robotic tasks, the proposed framework could lead to significant improvements in efficiency for vision-based robotics. Additionally, the proposed scenario will offer new insights on the applicability of the embodied cognition paradigm to a wider class of computer vision problems. To ensure maximal impact, in addition to the academic papers, the code will be released within two popular code bases: ROS and OpenCV. Additionally, the software required to interface with the racing simulator will also be released to foster comparison.

Moreover, this project will devise new tools and approaches to the driver assistance and driver-less cars industry. Monitoring the driver's attention is becoming an essential concern as more sophisticated cars also provide more distractions for the driver. This project will provide a better understanding on the ideal gaze patterns when driving. In addition, the attention process developed in this work will provide efficient alternatives to current vision-based driving systems, potentially reducing the reliance on additional sensors.

Finally, this project also has the potential to impact the psychological community by providing a new analysis tool for eye gaze in dynamic tasks based on the proposed model. Eye tracking is a popular paradigm for the analysis of human subjects' attention shifts, applied to a broad range of cognitive tasks. The proposed approach will provide a new tool for analysing attentional patterns, by comparing human gaze locations with locations where an optimal information processing system would focus its attention when solving the given task.

Funded Value:

£98,938

Funded Period:

Dec 16 - May 18

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/N035399/1

Principal Investigator:

Nicolas Pugeault

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (50%)

Vision & Senses - ICT appl. (50%)

Organisations

People	ORCID iD
Nicolas Pugeault (Principal Investigator)	http://orcid.org/0000-0002-3455-6280

Publications

Author Name

Title Publication Date Published

10 25 50

He S (2021) Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30 - December 4, 2020, Revised Selected Papers, Part IV

He S (2019) Understanding and Visualizing Deep Visual Saliency Models

He S (2018) Aggregated Sparse Attention for Steering Angle Prediction

He S (2019) Human Attention in Image Captioning: Dataset and Analysis

He Sen (2018) Deep saliency: What is learnt by a deep network about saliency? in arXiv e-prints

Kangin D (2018) Continuous Control With a Combination of Supervised and Reinforcement Learning

S He (2019) Understanding and Visualizing Deep Visual Saliency Models

Key Findings
Impact Summary
Research Databases and Models
Software and Technical Products
Engagement Activities


Description	The project has been investigating the importance of attention in visual perception and in particular for active tasks. We have investigated computational models attempting to predict where an observer would look in an image, an property called saliency. 1. Saliency as detection: Our research has investigated the state-of-the-art models, based on deep neural networks, and found that the common formulation as a a regression of a complete saliency map is inefficient: Fundamentally, either a location is salient or it is not, and therefore the problem can be effectively reframed as detecting high saliency regions. We demonstrated that similar performance could be obtained compared to state-of-the-art models, with much reduced learning time. 2. We devised a new approach for visualising what is learnt by deep saliency models, by using network dissection to investigate intermediate representations. We have demonstrated that some classes of objects and patterns (faces in particular) dominate the representation of saliency by those approaches, and that they perform poorly on synthetic stimuli, while intermediate representation appear to be receptive, showing some general overfitting of the saliency datasets. 3. We also investigated reinforcement learning approaches for learning to steer from visual images, and developed a new RL algorithm to address the poor efficiency of state-of-the-art RL with respect to the number of training episodes (published). 4. We also investigated the dynamics of attention when describing images, showing that existing automatic captioning systems can be improved by enforcing attention explicitely.
Exploitation Route	The project has provided a wealth of results towards a better understanding of novel deep saliency models, the use of task-driven soft attention for and some new approaches for improving the efficiency and usability of Deep RL approaches. The project generated new data, that has been released to the community, and the code for all algorithms and approaches has also been released as Open Source to the community to stimulate further research.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	The findings and methods developed during the DEVA project are currently being applied in a collaborative study with the Met Office aiming to improve the training of Meteorologists and the ergonomy of the computer system displaying the information to them. The study will make use of tools developed during the project DEVA and was supposed to take place last year but was delayed due to the pandemic. We are now considering the possibility to run this study this year. Moreover, the findings from this project have also been central in developing a new collaboration with Thales at the University of Glasgow.
Sector	Environment
Impact Types	Policy & public services


Title	Human Attention in Image Captioning Dataset
Description	This dataset contains a record consisting of eye movements and verbal descriptions recorded synchronously over a database of 4,000 images. This is the largest dataset of its kind.
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	Yes
Impact	This dataset has allowed research in the field of visual attention to depart from passive viewing towards the active task of image description.
URL	https://github.com/SenHe/uavdvsm


Title	Human Attention in Image Captioning Model (ICCV'2019)
Description	This model proposes a novel way to automatically caption images based on an accurate model of human attention.
Type Of Material	Computer model/algorithm
Year Produced	2019
Provided To Others?	Yes
Impact	This model provides a departure from previous automated captioning approaches by ensuring that the attention model used is consistent with human attention. We show that the generated captions are of better quality than previous models.
URL	https://github.com/SenHe/Human-Attention-in-Image-Captioning


Title	Understanding and Visualizing Deep Visual Saliency Models (CVPR'2019)
Description	This is the model discussed in the CVPR 2019 article, which proposes a novel way to analyse what is learnt by deep saliency models.
Type Of Material	Data analysis technique
Year Produced	2019
Provided To Others?	Yes
Impact	This research has highlighted several aspects on the respresentations that were learnt by deep saliency models that show differences with human attention. The publication of the model allows future researchers to perform the same analysis on their model to assess whether they address the articles points.
URL	https://github.com/SenHe/uavdvsm


Title	TRPO-REPLAY
Description	Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies. On-policy methods bring many benefits, such as ability to evaluate each resulting policy. However, they usually discard all the information about the policies which existed before. In this work, we propose adaptation of the replay buffer concept, borrowed from the off-policy learning setting, to create the method, combining advantages of on- and off-policy learning. To achieve this, the proposed algorithm generalises the Q-, value and advantage functions for data from multiple policies. The method uses trust region optimisation, while avoiding some of the common problems of the algorithms such as TRPO or ACKTR: it uses hyperparameters to replace the trust region selection heuristics, as well as the trainable covariance matrix instead of the fixed one. In many cases, the method not only improves the results comparing to the state-of-the-art trust region on-policy learning algorithms such as PPO, ACKTR and TRPO, but also with respect to their off-policy counterpart DDPG.
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	The submission provides a new baseline for the popular OpenAI environment, for other researchers to benchmark and compare.
URL	https://github.com/dkangin/baselines/tree/master/baselines/trpo_replay


Description	Christmas Lecture
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	This christmas lecture was given by the PI to a public of school children, informing on autonomous cars technologies, new developments and remaining challenges, as well as some of the aspects of ML for autonomous cars studied in the DEVA project. The presentation raised many interested questions from the students, and lead to an interesting discussion on ethical and societal aspects of the technology and the research that underlies it.
Year(s) Of Engagement Activity	2017
URL	https://emps.exeter.ac.uk/news-events/events-colloquia/event/?semID=2129&dateID=4747

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications