Reflexive robotics using asynchronous perception

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

This project will develop a fundamentally different approach to visual perception & autonomy where the concept of an image itself is replaced with a stream of independently firing pixels, similar to unsynchronised biological cells in the retina. Recent advances in computer vision & machine learning have enabled robots which can perceive, understand, and interact intelligently with, their environments. However, this "interpretive" behaviour is just one of the fundamental models of autonomy found in nature. The techniques developed in this project will exploit recent breakthroughs in instantaneous, non-image-based, visual sensing, to enable entirely new types of autonomous system. The corresponding step-change in robotic capabilities will impact the manufacturing, space, autonomous vehicles and medical sectors.

If we perceive an object approaching at high speed, we instinctively try to avoid the object without taking the time to interpret the scene. It is not important to understand what the object is or why it's approaching us. This "reflexive" behavioural model is vital to react to time-critical events. In such cases, the situation has often already been resolved by the time we become consciously aware of it. Reflexive behaviour is also a vital component of continuous control problems. We are reluctant to take our eyes off the road while driving, as we know that we will rapidly begin to veer off course without a constant cycle of perception and correction. We also find it far easier to pick up and manipulate objects while looking at them, rather than relying entirely on tactile sensing.
Unfortunately, visual sensing hardware requires enormous bandwidth. Megapixel cameras produce millions of bytes per frame. Thus, the temporal sampling rate is low, reaction times are high, and reflexive adjustments based on visual data become impractical.

We finally have the opportunity to overturn the paradigm of vision being impractical for low-latency problems, and to facilitate a step change in robotic capabilities, thanks to recent advances in visual sensor technology. Asynchronous visual sensors (also known as event cameras) eschew regular sensor wide updates (i.e. images). Instead, every pixel independently and asynchronously transmits a packet of information, as soon as it detects an intensity change from its previous transmission. This drastically reduces data bandwidth by avoiding the redundant transmission of unchanged pixels. More importantly, because these packets are transmitted immediately, the sensor typically provides a latency reduction of 3 orders of magnitude (30ms to 30us) between an event occurring and it being perceived.

This advancement in visual sensing is dramatic, but we are desperately in need of a commensurate revolution in robotic perception research. Without the concepts of the image or synchronous sampling, decades of computer vision and machine learning research is rendered unusable with these sensors. This project will provide the theoretical foundations for the robot perception revolution, by developing novel asynchronous paradigms for both perception and understanding. Mirroring biological systems, this will comprise a hierarchical perception framework encompassing both low-level reflexes and high-level understanding, in a manner reminiscent of modern deep-learning. However, unlike deep-learning, pixel-update events will occur asynchronously and will propagate independently through the system, hence maintaining extremely low latency.

The sensor technology is still in its early trial phase, and few researchers are exploring its implications for perception. No group, nationally or internationally, is currently making a concerted effort in this area. Hence, this project not only lays the groundwork for a plethora of new biologically-inspired "reflexive robotics" applications. It will also support the development of a unique new research team, placing the UK at the forefront of this exciting field.

Planned Impact

This research is a disruptive technology spanning several of the largest growing research fields: Robotics, Computer Vision and AI. Consequently, the impact of this research will be felt across a broad range of areas.
For society as a whole, the major impact will be improved productivity, leading to a corresponding improvement in quality of life. There are also likely to be societal health benefits from the increased automation of hazardous jobs and a strengthening of the economy.

In addition to direct economic growth through increased automation, another potential benefit of building a dedicated UK team in a newly emerging area with such promise, is a strengthening of our international leverage. Post-brexit, the ability to bring some truly unique expertise to the negotiating table is invaluable in the crowded Robotics and Autonomous Systems marketplace. The pathways to impact outlines a number of techniques which will ensure that the UK has the knowledge, and people, to become the commercial centre for this technology (beyond this proposal and the PI's research team).

A list of more specific industrial sectors which will be impacted by this research is provided below. For each of the currently identified areas, the PI has already begun building relationships with potential industrial partners (see the Strategic Advisory Board in the Pathways to Impact).
- Autonomous vehicles - it is obvious that driving is an area where the ability to make reflexive low latency actions in an emergency situation, is vital. Perceptual techniques which specifically account for such low-latency emergencies will significantly reduce fatality rates of future autonomous vehicles. Similarly, consumer convenience will be better served by reliable autonomous delivery systems, which are able to react appropriately to dangerous situations.
- Manufacturing - as discussed under National Importance, the UK manufacturing sector has some of the worst productivity ratings, and lowest levels of automation, in the EU. This is largely due to the huge quantity of SME manufacturers, whose production runs are too limited to warrant expensive robotic integration. Improving the intelligence and reactivity of robotic automation systems, coupled with suitable hardware, will loosen the integration requirements and make automation feasible for a wider range of small batch manufacturers. Apart from the economic benefits, this will have additional societal impacts: greater consumer choice and more "personalised" manufacturing.
- Space robotics - as with autonomous vehicles, this is a high stakes environment where accidents come with an enormous cost, and the ability to react rapidly to emergencies is invaluable. Reliable space robotics systems will drastically reduce the cost of many space missions, by removing the need for life support systems. In addition to enabling greater exploitation of space for scientific and industrial purposes, this research will reduce the loss of life to human astronauts.

In these industrial sectors, the impact of the proposed research is clearly apparent. However, some of the more speculative impact areas may also prove to be the most exciting. Developing and exploring the properties of bio-inspired asynchronous perception systems, could have a profound impact on our understanding of the biological systems they mimic. Changes in our understanding of perception could affect the way we approach learning and teaching. It may also inform how we deal with certain disabilities and psychological disorders. As with the industrial areas above, the Strategic Advisory Board has been designed to help ensure these potential impacts are appropriate explored.
 
Description In order for a robotic system to achieve autonomous capabilities, there are many tasks which must be solved. Some examples are depth perception, motion estimation and obstacle avoidance. Computer vision approaches are key to solving many of these. However, despite their success, there are fundamental differences between traditional computer vision technology and naturally occurring perception in organisms. Digital cameras quantise visual data into image frames which are then processed sequentially. Event cameras bridge this gap by treating pixels asynchronously, and give distinct advantages such as high temporal resolution and elimination of motion blur.

Obstacle avoidance is an ideal application for event cameras, because there are many situations where a traditional frame-based digital camera would struggle. This may be due to motion blur, a slow frame rate or poor dynamic range. Unfortunately, event streams are incompatible with most existing computer vision and machine learning approaches. Due to the nature of the input data, many event-based approaches focus on 2D solutions and rely on heuristics to determine collision risk. In this project, we instead look at 3D perception using events. The events were used to predict optical flow without being subject to poor low-light performance and motion blur. We showed that combining this with depth data enables prediction of metric time-to-impact estimation which is a more reliable measure of urgency. To facilitate training this network, we also released an extensive event-based dataset with six visual streams spanning over 700 scanned scenes.

The second step in the project was to tackle the paradigm of event quantisation for training traditional neural networks. Although neural networks for visual tasks are somewhat well-established, digital 'neurons' are a very loose approximation of biological neurons. Spiking neural networks seek to more closely emulate biological neurons, but are especially difficult to train for tasks which predict a high-resolution output. We developed a new type of neural network called an EDeNN which operates closer to the original event data stream, and avoids difficulties related to training spiking neural networks. Instead of processing an entire volume of accumulated events at once, our method is able to process single time slices sequentially. This results in an order of magnitude less processing time, while still demonstrating state-of-the-art performance in angular velocity regression and competitive optical flow estimation. We hope that the toolbox we have created for training EDeNNs will become the de-facto standard for performing machine learning with event cameras in the future, and will lead to a rapid growth of applications in this area.

Stage 3 of the project utilised and extended this toolbox to work for reinforcement learning. Building on our developed EDeNN, we developed a framework that utilised event data in simulated environments to train agents with reinforcement learning. By tackling event quantisation in the same manner as for supervised EDeNNs, we are able to treat experience in these environments as continuous-time. This meant we could take samples of varying length, which may depend on the information density in a particular situation. This step also involved creating a simulator to generate event streams from any existing renderable environment.

The final stage of the project which is currently ongoing is the merger of the reactive or "reflexive" reinforcement learning component with the longer term "concious" planning elements within a single machine learning system. Initial experiments have been undertaken on this, and promising results have been found. As such it is likely this project will achieve all it's originally stated goals by the end of the funding period.
Exploitation Route We hope that both academia and industry will take this work forward by building on the techniques and more importantly the EDeNN toolbox. This should allow the development of many new computer vision applications for event cameras.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Security and Diplomacy

 
Description There is a need for new research approaches using this emerging technology. By designing new algorithms, network types and software libraries, we hope that our work can be built upon to bring this camera technology to the mainstream of academia and industry. Deployed applications may be much more computationally and energy efficient, and may not need specialised chips designed for parallel computing such as GPUs. Potential impact sectors include robotics, the manufacturing sector, autonomous space systems, surveillance and more.
First Year Of Impact 2021
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Security and Diplomacy
Impact Types Cultural,Economic

 
Title DeFeat-Net 
Description The released algorithm and trained model DeFeat-Net (Depth & Feature network), provides an approach to simultaneously learn a cross-domain dense feature representation, alongside a robust depth-estimation framework based on warped feature consistency. The resulting feature representation is learned in an unsupervised manner with no explicit ground-truth correspondences required. This approach to monocular depth estimation provides robustness to challenging domains such as nighttime scenes or adverse weather conditions where the photometric consistency assumptions of traditional approaches break down. The rprovided technique is comparable to both the current state of the art in monocular depth estimation and supervised feature representation learning. However, by simultaneously learning features, depth and motion, our technique is able to generalize to challenging domains, allowing DeFeat-Net to outperform the current state-of-the-art with around 10% reduction in all error measures on more challenging sequences such as nighttime driving. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? Yes  
Impact At the time of reporting this algorithm has been released for several months. In that time, it has garnered a number of endorsements on github and inspired one follow-on publication by another international research team. 
URL https://github.com/jspenmar/DeFeat-Net
 
Title EVReflex 
Description The released algorithm and trained model EVReflex (Event-based Reflexive) utilises an event stream with depth to provide an approach to predict dense time-to-impact maps which can be directly used for dynamic obstacle avoidance. Existing perception techniques for depth cameras are typically slow for obstacle avoidance, and event cameras struggle with textureless surfaces. However, fusing depth with the event stream overcomes the failure cases of each individual modality and enables perception with high dynamic range and temporal resolution. The metric time-to-impact maps are estimated without prior knowledge of the scene geometry or obstacles, which contrasts many other obstacle avoidance approaches which make prior assumptions about the number or type of obstacles to be avoided. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact At the time of reporting this algorithm has been released for several months. In that time, there has been interest and additional questions from other authors, including from members of the Robotics and Perception Group at ETH Zurich, known for its contributions to event camera research. 
URL https://gitlab.surrey.ac.uk/cw0071/EVReflex
 
Title Tools for Explaining Autonomous Agent Behaviour 
Description This is a toolset for explaining the behaviours of autonomous agents trained using reinforcement learning. The approach allows users to understand the "intention" of the agent (i.e. what it hoped to achieve) when taking a particular action. This toolset is useful for allowing members of the public to inspect and understand the behaviours of a complex autonomous agent. It is also invaluable for RL researchers and developers who wish to introspect and debug complex agents. This toolset has been released publicly and open source. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2020 
Impact The tool was only released a few weeks before this reporting period. However, it has been used internally for several other research projects run by the grant holder. 
URL https://github.com/hmhyau/rl-intention
 
Description ICRA 2022 CRMT Presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The goal of this presentation is to present the work "Generalizing to New Tasks via One-Shot Compositional Subgoals" to researchers working in a similar direction. The outcomes of this presentation includes increased visibility and recognition of the research project, dissemination of research findings to a broader audience, and engagement with stakeholders who may be interested in the research.
Year(s) Of Engagement Activity 2022
URL https://idsc.ethz.ch/research-frazzoli/workshops/compositional-robotics-2022.html
 
Description Presentation to Senior Vice President of ExxonMobil 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact a short presentation to the Senior Vice President of ExxonMobil about AI and robotic research. The intended purpose of this presentation may be to introduce the Senior Vice President to the latest advances in AI and robotics research from our research team and to discuss potential applications of these technologies within ExxonMobil's operations. We also talked about the research and work going on at the university, and the invited guest is very happy and showed great interest in following up with further research. The Senior Vice President also shared some insight into the potential benefits and challenges associated with adopting these technologies, and how they may impact the company's long-term strategy. This leads to new opportunities for collaboration and partnerships between ExxonMobil and our university.
Year(s) Of Engagement Activity 2020
 
Description Project presentation for advisory board 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Supporters
Results and Impact (Upcoming)
When the project was originally proposed, a number of people expressed interest in being on the advisory board. The original plan was for annual advisory board meetings, but these did not go ahead as a result of the pandemic and a change in team members.
The purpose of the meeting is to share the research results, and to discuss the theoretical foundations that have been laid, and the progress and findings on the project as a whole.
It will also be chance to hear thoughts on potential impacts in the respective areas of industry of the board partners.
Members will be able to attend the University of Surrey or join remotely.
Year(s) Of Engagement Activity 2023
 
Description SPOT video 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact A vlog video about the SPOT robot, this video has garnered 10K views. The intended purpose of the video was likely to provide information and showcase the capabilities of the SPOT robot, which is a quadruped robot designed by Boston Dynamics for various applications such as inspection, data collection, and even as a first responder in emergency situations.

The outcome of the video, based on the number of views, suggests that it has generated significant interest and attention from viewers who are interested in learning more about robotics. The video may have also sparked discussions and questions from viewers, which can lead to increased awareness and interest.
Year(s) Of Engagement Activity 2022
URL https://www.bilibili.com/video/BV1CR4y1c7Nm/
 
Description TAROS 2021 Industry and Research Showcase Day 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact As part of the Towards Autonomous Robotic Systems Conference (TAROS), this presentation is an online session with multiple universities in which we introduce and exhibit our innovative products, research and capabilities. The presentation consists of the works and demos from the members of this research group. The outcomes of this presentation could include increased visibility and recognition of the research project and its contributors, as well as potential collaborations and partnerships with other researchers and organizations in the field. The presentation may also inspire new research projects and collaborations, which can lead to further advancements and innovations in the field.
Year(s) Of Engagement Activity 2021
URL https://lcas.lincoln.ac.uk/wp/taros-2021/