DEVA - Autonomous Driving with Emergent Visual Attention

Lead Research Organisation: University of Exeter
Department Name: Engineering Computer Science and Maths

Abstract

How does a racer drive around a track? Approaching a bend in the road, a driver needs to monitor the road, steer around curves, manage speed and plan a trajectory avoiding collisions with other cars - and all of this, fast and accurately. For robots this remains a challenge: despite progress in computer vision over the last decades, artificial vision systems remain far from human vision in performance, robustness and speed. As a consequence, current prototypes of self-driving cars rely on a wide variety of sensors to palliate the limitations of their visual perception. One crucial aspect that distinguishes human from artificial vision is our capacity to focus and shift our attention. This project will propose a new model of visual attention for a robot driver, and investigate how attention focusing can be learnt automatically by trying to improve the robot's driving.

How and where we focus our attention when solving a task such as driving is studied by psychologists, and the numerous models of attention can be sorted in two categories: first, top-down models capture how world knowledge and expectations guide our attention when performing a specific task; second, bottom-up models characterise how properties of the visual signal make specific regions capture our attention, a property often referred to as saliency. Yet, from a robotics perspective, there remains a lack of a unified framework describing the interplay of bottom-up and top-down attention, especially for a dynamic, time-critical task such as driving. In the racing scenario described above, the driver must take quick and decisive action to steer around bends and avoid obstacles - efficient use of attention is therefore critical.

This project will investigate the hypothesis that our attention mechanisms are learnt on a task specific basis, in a such a way as to provide our visual system optimal information for performing the task. We will investigate how state-of-the-art computer vision and machine learning approaches can be used to learn attention, perception and action jointly to allow a robot driver to compete with humans on a racing simulator, using visual perception only.

A generic learning framework for task-specific attention will be developed that is applicable across a broad range of visual tasks, and bears
the potential for reducing the gap with human performance by a critical reduction in current processing times.

Planned Impact

This project will have impact in three communities:
(1) Computer vision and robotics community
(2) Car safety and autonomous cars industry
(3) Psychologists in attention research

The computer vision and robotics community will benefit directly from the new knowledge and techniques developed during this project. By proposing a new approach to reduce the amount of visual data to be processed while solving robotic tasks, the proposed framework could lead to significant improvements in efficiency for vision-based robotics. Additionally, the proposed scenario will offer new insights on the applicability of the embodied cognition paradigm to a wider class of computer vision problems. To ensure maximal impact, in addition to the academic papers, the code will be released within two popular code bases: ROS and OpenCV. Additionally, the software required to interface with the racing simulator will also be released to foster comparison.

Moreover, this project will devise new tools and approaches to the driver assistance and driver-less cars industry. Monitoring the driver's attention is becoming an essential concern as more sophisticated cars also provide more distractions for the driver. This project will provide a better understanding on the ideal gaze patterns when driving. In addition, the attention process developed in this work will provide efficient alternatives to current vision-based driving systems, potentially reducing the reliance on additional sensors.

Finally, this project also has the potential to impact the psychological community by providing a new analysis tool for eye gaze in dynamic tasks based on the proposed model. Eye tracking is a popular paradigm for the analysis of human subjects' attention shifts, applied to a broad range of cognitive tasks. The proposed approach will provide a new tool for analysing attentional patterns, by comparing human gaze locations with locations where an optimal information processing system would focus its attention when solving the given task.
 
Description The project has been investigating the importance of attention in visual perception and in particular for active tasks. We have investigated computational models attempting to predict where an observer would look in an image, an property called saliency.
1. Saliency as detection: Our research has investigated the state-of-the-art models, based on deep neural networks, and found that the common formulation as a a regression of a complete saliency map is inefficient: Fundamentally, either a location is salient or it is not, and therefore the problem can be effectively reframed as detecting high saliency regions. We demonstrated that similar performance could be obtained compared to state-of-the-art models, with much reduced learning time.
2. We devised a new approach for visualising what is learnt by deep saliency models, by using network dissection to investigate intermediate representations. We have demonstrated that some classes of objects and patterns (faces in particular) dominate the representation of saliency by those approaches, and that they perform poorly on synthetic stimuli, while intermediate representation appear to be receptive, showing some general overfitting of the saliency datasets.
3. We also investigated reinforcement learning approaches for learning to steer from visual images, and developed a new RL algorithm to address the poor efficiency of state-of-the-art RL with respect to the number of training episodes (published).
4. We also investigated the dynamics of attention when describing images, showing that existing automatic captioning systems can be improved by enforcing attention explicitely.
Exploitation Route The project has provided a wealth of results towards a better understanding of novel deep saliency models, the use of task-driven soft attention for and some new approaches for improving the efficiency and usability of Deep RL approaches. The project generated new data, that has been released to the community, and the code for all algorithms and approaches has also been released as Open Source to the community to stimulate further research.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description The findings and methods developed during the DEVA project are currently being applied in a collaborative study with the Met Office aiming to improve the training of Meteorologists and the ergonomy of the computer system displaying the information to them. The study will make use of tools developed during the project DEVA and was supposed to take place last year but was delayed due to the pandemic. We are now considering the possibility to run this study this year. Moreover, the findings from this project have also been central in developing a new collaboration with Thales at the University of Glasgow.
Sector Environment
Impact Types Policy & public services

 
Title Human Attention in Image Captioning Dataset 
Description This dataset contains a record consisting of eye movements and verbal descriptions recorded synchronously over a database of 4,000 images. This is the largest dataset of its kind. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact This dataset has allowed research in the field of visual attention to depart from passive viewing towards the active task of image description. 
URL https://github.com/SenHe/uavdvsm
 
Title Human Attention in Image Captioning Model (ICCV'2019) 
Description This model proposes a novel way to automatically caption images based on an accurate model of human attention. 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact This model provides a departure from previous automated captioning approaches by ensuring that the attention model used is consistent with human attention. We show that the generated captions are of better quality than previous models. 
URL https://github.com/SenHe/Human-Attention-in-Image-Captioning
 
Title Understanding and Visualizing Deep Visual Saliency Models (CVPR'2019) 
Description This is the model discussed in the CVPR 2019 article, which proposes a novel way to analyse what is learnt by deep saliency models. 
Type Of Material Data analysis technique 
Year Produced 2019 
Provided To Others? Yes  
Impact This research has highlighted several aspects on the respresentations that were learnt by deep saliency models that show differences with human attention. The publication of the model allows future researchers to perform the same analysis on their model to assess whether they address the articles points. 
URL https://github.com/SenHe/uavdvsm
 
Title TRPO-REPLAY 
Description Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies. On-policy methods bring many benefits, such as ability to evaluate each resulting policy. However, they usually discard all the information about the policies which existed before. In this work, we propose adaptation of the replay buffer concept, borrowed from the off-policy learning setting, to create the method, combining advantages of on- and off-policy learning. To achieve this, the proposed algorithm generalises the Q-, value and advantage functions for data from multiple policies. The method uses trust region optimisation, while avoiding some of the common problems of the algorithms such as TRPO or ACKTR: it uses hyperparameters to replace the trust region selection heuristics, as well as the trainable covariance matrix instead of the fixed one. In many cases, the method not only improves the results comparing to the state-of-the-art trust region on-policy learning algorithms such as PPO, ACKTR and TRPO, but also with respect to their off-policy counterpart DDPG. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact The submission provides a new baseline for the popular OpenAI environment, for other researchers to benchmark and compare. 
URL https://github.com/dkangin/baselines/tree/master/baselines/trpo_replay
 
Description Christmas Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact This christmas lecture was given by the PI to a public of school children, informing on autonomous cars technologies, new developments and remaining challenges, as well as some of the aspects of ML for autonomous cars studied in the DEVA project. The presentation raised many interested questions from the students, and lead to an interesting discussion on ethical and societal aspects of the technology and the research that underlies it.
Year(s) Of Engagement Activity 2017
URL https://emps.exeter.ac.uk/news-events/events-colloquia/event/?semID=2129&dateID=4747