REcoVER: Learning algorithms for REsilient and VErsatile Robots

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

Robots have the potential to deliver tremendous benefits to our society by assisting us in all aspects of our everyday life. For example, they could increase the quality of life of elderly people by allowing them to stay longer at home on their own, through preparing meals, cleaning the house, and assisting them to get dressed. However, robots such as legged robots are also very complex machines, which are highly prone to damage when they are not operating in the well-controlled environments of factories. Moreover, because of this complexity and the large variety of environments they might encounter, it impossible for engineers to anticipate all the damage situations that the robot may encounter and to program its reactions accordingly.

A promising approach to overcome this difficulty is to enable robots to learn on their own how to face and how to respond to the different situations they encounter. This approach shares similarities with the way humans and animals react in analogous circumstances. For instance, a child with a sprained ankle learns on his own how to walk with only one foot in order to minimise the pain. The objective of this research project is to develop the algorithmic foundations that allow robots to do the same. In previous works, we have developed creative learning algorithms that enable (physical) legged robots to overcome the loss of a leg by learning how to walk forward in less than two minutes. However, in these works, the algorithms were configured to solve a single task (i.e., walking forward), which does not leverage the versatility of legged robots and their capability, for instance, to walk in every direction, to jump, and to crawl.

The ambition of this project is to extend the adaptation capabilities of our algorithms to the entire range of the robots' abilities. This will be achieved by employing recent advances in hierarchical reinforcement learning to transfer knowledge during the adaptation process across the different skills of the robots. The combination of these hierarchical skill repertoires with our online-adaptation algorithms will enable robots to quickly transfer the result of their adaptation on one skill to the other skills. For instance, after finding a new way to walk forward, a robot might have discovered that it cannot rely on its front-left leg. With the proposed project, this information will be automatically used by the robot to speed-up the adaptation process when it will try, for instance, to learn to turn by avoiding to use the front-left leg too. In addition to damage recovery, the same algorithm will enable robots to adapt from changes in their environment, for instance by changing their behaviours depending on whether they walk on flat concrete floor or on sloping grassy ground.

Increasing the adaptation capabilities of versatile robots aims in the long term to enable the use of robots to substitutes humans in the most dangerous task they have to perform. For instance, thanks to robots with improved adaptation abilities, it would be possible to send robots searching for survivors after an earthquake or to operate in a nuclear plant after a disaster. Improving the ability of robots to overcome unknown situations is one of the key requirements to enable them to be a significant part of our daily life.

This research will be undertaken at Imperial College London, in the department of computing. The project will benefit from state of the art robotic facilities, including a quadruped robot, a hexapod robot and a motion capture system, to develop and experiment a new generation of learning algorithms for resilient robots.
 
Description During the RECOVER project, we invented several novel algorithms that enable robots to rapidly recover from unforeseen mechanical damages. For instance, a 6-legged robot can rapidly recover from losing one leg, without any diagnostic or dedicated sensors, and simultaneously continue with its mission, such as maze navigation.

The main insight to achieve this is to enable the robot to learn a large collection of different ways to walk. For instance, with a 6-legged robot, this means using different all its legs to move. When damage occurs, our algorithms will rapidly search the collection of different gaits to find one that still works well despite the ongoing damage. This is similar to building a large collection of backup plans. However, this collection of plans does not assume any specific damage condition. Instead, it searches all the different ways the robot could use to walk when being intact. This is analogous to children: a child will naturally learn a lot of alternative ways to walk, like hopping on one leg or walking on all fours, simply because it is fun. This diversity of walking gait will however become instrumental if the child experiences a sprained ankle: the child will be able to instantaneously switch to one of these alternative gaits, as it minimises the pain. RECOVER is based on the same principle.

There is however a limitation: if we need a large number (like thousands) of alternatives for every single skill that our robot has to execute to achieve its missing, then the total number of alternatives that have to be learnt will become intractable (i.e., millions of alternatives). To solve this challenge, we proposed in RECOVER to decompose every skill into a tree of sub-skills. For instance, walking forward can be decomposed into a succession of steps, which can all be decomposed into a series of movements of the legs. Interestingly, these sub-skills can be shared across higher-skills. For instance, walking backwards also requires a series of leg movements. This hierarchical decomposition of skills can be used then to enforce diversity of alternatives at the sub-skills level and thus maintain the number of high-level skills tractable.

In practice, we showed during the RECOVER project that this allows a 6-legged robot to be able to autonomously recover from the unexpected loss of one of its legs, while simultaneously performing complex maze navigation tasks, which require locomotion skills to go in every possible direction.
Exploitation Route This new technology can be used by a large variety of applications, such as transport vehicles and critical infrastructure to enable rapid recovery after unexpected damage or perturbation in the environment. Within follow-up projects, we are currently investigating how the findings of RECOVER can be used to make cars and ground vehicles safer to drive under a large range of perturbations (flat tyres, partial loss of traction, destabilizing payloads).
Sectors Aerospace

Defence and Marine

Transport

 
Description The novel algorithms developed during the RECOVER project demonstrated that artificial intelligence and machine learning can be instrumental in enabling machines and infrastructures to adapt to unforeseen situations. After publishing several works in this direction, we have been invited to participate in international studies to apply our algorithms and findings on new types of robots (similar to cars, and boats) to showcase the potential of this technology to make transport and infrastructure safer and more resilient.
First Year Of Impact 2023
Sector Aerospace, Defence and Marine,Transport
Impact Types Societal

 
Description Learning Introspective Control
Amount $1,400,000 (USD)
Organisation Defense Advanced Research Projects Agency (DARPA) 
Sector Public
Country United States
Start 11/2022 
End 11/2025
 
Title QDax: Accelerated Quality-Diversity 
Description QDax is a tool to accelerate Quality-Diversity (QD) and neuro-evolution algorithms through hardware accelerators and massive parallelization. QD algorithms usually take days/weeks to run on large CPU clusters. With QDax, QD algorithms can now be run in minutes! QDax has been developed as a research framework: it is flexible and easy to extend and build on and can be used for any problem setting. My entire research group is now using this tool everyday, and many other groups around the world too. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This software leverages hardware acceleration (like GPU) to speed up a new family of algorithms, called Quality-Diversity Algorithms, by a factor of 100x. This means that instead of waiting days to get our results, we can now achieve the same outcomes in a few minutes. In less than 1 year, the Github repository already collected 200 stars and is 
URL https://github.com/adaptive-intelligent-robotics/QDax/