Sim-to-Real Deep Reinforcement Learning for legged robot locomotion with vision-based high dimensional data

Lead Research Organisation: University of Bristol
Department Name: Aerospace Engineering

Abstract

This PhD will explore methods that allow legged robots to improve and adapt its gaits to various terrains. For the MSc the physics simulator was pre-programmed with the environment terrain and robot dimensions. Parameters such as friction coefficients and weight distributions were roughly estimated. However, for the PhD the robot will build a model of itself in a 3D environment using a combination of vision and depth sensing combined with orientation sensing and robot babbling. This will allow the robot to adjust the parameters of the simulated environment which may increase adaptability and reduce the reality gap.

The aim is to contribute a novel method that allows any type of legged robot to manoeuvre from high dimensional input data. This method aims to address the adaptability problems with explicitly programmed algorithms whist also addressing the reality gap issues with PPO reinforcement learning. The input data will be from RGBD, joint parameters, orientation and tactile sensing. Note PPO performs exceptionally well with high dimension inputs and these will be required in order to identify the complexities of the real world.

Aims and objectives:
1. Build a legged robot capable of sensing its environment (i.e. RGBD, orientation and tactile sensors)
2. Self-model Agent - Allow the robot to perform robot babbling and use the orientation, tactile and joint position sensors to self-model the agent
3. Model Environment - Allow the robot to scan the room with RGBD sensors to model its terrain and identify its target location (i.e. a ball in a room)
4. Train the simulated robot with reinforcement learning in the 3D world to determine a policy to reach the target location
5. Deploy the trained policy on the physical robot
6. Measure the 'reality gap' between the robot's performance in simulation and the physical world.
7. Adapt robot babbling and environment modelling accordingly.
8. TRL 4 quadruped robot that can traverse previously unseen terrains/scenarios toward a goal

Planned Impact

Rapid growth in the already burgeoning Robotics and Autonomous Systems (RAS) market has been estimated from many sources. This growth is driven by socio-economic needs and enabled by advances in algorithms and technologies converging on robotics. The market potential for applications of robotics and autonomous systems is, therefore, of huge value to the UK. There are four major areas where FARSCOPE will strive to fulfil and deliver on the impact agenda.

1. Training: A coherent strategy for impact must observe the value of the 'innovation pipeline'; from training of world-class researchers to novel products in the 'shop window'. The FARSCOPE training programme described in the Case for Support will produce researchers who will be able to advance knowledge, expertise and skills in the many associated aspects of academic pursuit in the field. Crucially, they will be guided by its industrial partners and BRL's Industrial Advisory Group, so that they are grounded in the real-world context of the many robotics and autonomous systems application domains. This means pursuing research excellence while embracing the challenges set within the context of a range of real-world factors.

2. Economic and Social Exploitation: The elevated position of advanced robotics, in the commercial 'value chain', makes it imperative that we create graduates from our Centre who are acutely aware of this potential. BRL is centrally engaged in its regional SME and business ecology, as evidenced by its recent industry workshop and 'open lab' events, which attracted some 60 and 280 industrial delegates respectively. BRL is also a key contributor to regional economic innovation. BRL has engaged two business managers and allocated some dedicated space to specifically support work on RAS related industrial engagement and innovation and, importantly, technology incubation. BRL will be creating an EU-funded Robotics Innovation Facilities to help coordinating a EUR 20m a programme to specifically promote and encourage direct links between academia and industry with a focus on SMEs. All of these high-impact BRL activities will be fed directly into the FARSCOPE programme.

A critical mass of key industrial and end-user partnerships across a diverse array of sectors have given their support to the FARSCOPE centre. All have indicated their interest in engaging through the FARSCOPE mechanisms identified in the Case for Support. These demonstrate the impact of the FARSCOPE centre in engaging existing, and forming new, strategic partnerships in the RAS field.

3. Fostering links with other Research Institutions and Academic Dissemination: It is essential that FARSCOPE CDT students learn to share best practice with other RAS research centres, both in the UK and beyond. In addition to attendance and presentation at academic conferences nationally and overseas, FARSCOPE will use the following mechanisms to engage with the academic community. BRL has very many strong links with the UK, EU and global RAS research community. We will use these as a basis for cementing existing links and fostering new ones.

4. Engaging the Public: FARSCOPE will train and then encourage its student cohorts to engage with the general public, to educate about the potential of these new technologies, to participate in debates on ethics, safety and legality of autonomous systems, and to enthuse future generations to work in this exciting area. UWE and the University of Bristol, BRL's two supporting institutions, host the National Coordinating Centre for Public Engagement. In addition, UWE's Science Communication Unit is internationally renowned for its diverse and innovative activities, which engage the public with science. FARSCOPE students will receive guidance and training in public engagement in order to act as worthy RAS research 'ambassadors'.

Publications

10 25 50
 
Description Experiments currently being conducted with papers planned for submission in the next few months
1) The impact of surface contact forces for the learning of quadrupedal robots from simulation to reality.
2) A symmetry exploiting action based approach to reduce the training time of legged robots using deep reinforcement learning
3) A new architecture that allowing legged robots to simultaneously learn to walk and navigate from random behaviour. This allows researchers to have high level control of legged robots in the real world.

Next step (primary phd objective)
In addition, the architecture is being expanded to allow the robot to improve its own simulation of the real world from sparse observations. It is predicted to allow legged robots to rapidly adapt to damage (such as lost limbs) and environmental changes (such as variations in terrain) even if they have never been seen before in simulation.
Exploitation Route The research aims to improve real world rapid adaptability of legged robots to previously unseen conditions and environments. This has significant impacts in areas such as defence, search and rescue and assisted living.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Construction,Education,Leisure Activities, including Sports, Recreation and Tourism,Transport