Redesigning the ways we learn and express policies for autonomous agents, to increasing robustness, resolve simulation-to-real transfer, and enable sh

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

This project falls within the EPSRC Robotics and Artificial Intelligence Technologies research areas (Engineering theme).

As I start my PhD journey, there are two broad ideas, which I believe to be linked, which I am working towards contributing towards and developing further. The first is to develop methods to perform hierarchical reinforcement learning. Reinforcement learning (RL) is a machine learning/artificial intelligence methodology which has been researched since the mid-20th century and yet which has only had marked success in the last four or so years, perhaps most notably in 2016 when an RL-based agent beat the world Go champion Lee Sedol in a tournament reported around the world. At the Oxford Robotics Institute, we are very interested in applying RL to physical robotic control, and area where it is yet to see really impactful success. A limitation of RL is that it is immensely slow and data-intensive to train, and results in very brittle control policies that tend to perform badly on anything other than the exact problem they have learnt to solve. These limitations hold back the application of this exciting methodology in the exciting domain of robotics, and my aim is to help tackle this by building a method to perform hierarchical RL. This would introduce a new structure to RL in which policies and learning would occur and multiple timescales, or levels of 'temporal abstraction'. This would draw on ideas from the more established and successful domain of control theory, in which hierarchical control has long been applied, but bring these ideas into our new, learning based paradigm.

Secondly, I hope that through work on learning to control at different levels of temporal abstraction, I will be able to devise a robotics-centric approach to encoding optimally computed robot motions in such a way that they can be automatically permuted and reused across different scenarios and different robots. One of my supervisors, Ioannis Havoutis, is involved in a European project called MEMMO (memory of motion), which is seeking to find a way of performing motion generation for complex robots with arms and legs, based on pre-existing optimally computed and saved (in the 'memory of motion') motion sequences. Crucially, the method sought should allow these sequences to have been designed for or executed on different robotic platforms to the one that will then use the MEMMO system for motion. This requires a solution to the problem of encoding in some way the helpful information from the motions such that it can be drawn on by another system. Current approaches do not consider adequately how best to do this in light of the intended application, instead drawing on conventional approaches such as autoencoding or other methods of dimensionality reduction to 'encode' these motions. This gives us no more advantage when it comes to drawing on, altering and reusing these motions in new situations. Rather, these approaches are simply data compression. Contrastingly, this is a problem I hope to tackle through developing a mature methodology around learning motions and control at multiple timescales, an approach which would consider from the very first the time-varying nature of this sort of data.

Publications

10 25 50