AI and robotics - applying the maximum entropy framework to real-world robotics tasks

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

Context and impact

In recent years, robotics has shown increasing promise in moving outside of laboratories and into real-world tasks. Areas such as car manufacturing that require simple and repetitive motions have felt the impact of robotics for years, but the current challenge is to extend this reach into large, dynamic environments that involve interaction between humans and robots. The economic impact of intelligent autonomous systems, once deployed at scale, will be vast, allowing the work of one person to be leveraged many times over, and creating orders of magnitude of efficiency gains. One of the key components of these systems is control, which is where my research sits.

Aims and objectives

The control tasks relevant to real-world robotics broadly fall into two categories: locomotion, my focus, and manipulation. While locomotion over flat ground is fairly straightforward, things become much more difficult over rough terrain, requiring the use of extra sensory modes like vision to anticipate obstacles and act accordingly. One of the principal aims of my research is to increase the effective range of operation of state-of-the-art locomotion controllers. More complex forms of movement like climbing and jumping, performed robustly, are currently out of the reach of modern robotics systems. Extending these capabilities would greatly enhance the domain of autonomy of these systems, furthering real-world deployment.

Novelty of the research methodology

One of the key technologies powering state-of-the-art robotics is Deep Reinforcement learning (RL). This will be the technique I will primarily focus on for my research and will aim to extend its capabilities by addressing the following questions.

The first of these is how we can make training reinforcement learning systems more stable and effective. RL algorithms commonly fall into locally optimal solutions that either partially solve a problem or 'games' the reward function in an unhelpful way - like an agent that doesn't move, to avoid penalties for collapsing but sidestepping the task we want it to accomplish. Additionally, the search for useful actions often involves highly unstable behaviour which when deployed on real-world systems can easily result in damage to the hardware, or more importantly, to nearby people. If we wish to one day have intelligent autonomous systems that can adapt to real-time changes in their environment, these problems must be solved. From my research, a promising solution to both of these issues is the maximum entropy framework (MaxEnt).

Maximum entropy algorithms jointly optimise for the agent's reward function and the 'entropy' - which can be thought of as a measure of randomness - of the distribution of actions it takes. The benefits of this approach are as follows. Firstly it is a simple and robust solution to the exploration-exploitation trade-off. This is one of the key dilemmas in designing an RL system and can be thought of as balancing solutions we know are currently effective, and exploring other options for solutions that might be even more effective. Secondly, MaxEnt has a unique advantage over other methods in that it allows for multimodal solutions, meaning that if there are multiple equally valid ways of solving a problem, the agent can retain all of them, giving us greater flexibility.

Alignment to EPSRC's strategies and research areas

I believe this project falls squarely under the UKRI's AI and robotics theme, as robust control policies are one of the core challenges that need to be solved for our systems, and thus the UK, to be resilient and effective. Additionally, my research coincides with many of the strategic priorities like artificial intelligence, frontiers in engineering and technology, and possibly even more distant areas like transforming health and healthcare with the addition of robotic manipulators in surgical procedures.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517811/1 01/10/2020 30/09/2025
2745856 Studentship EP/T517811/1 01/10/2022 30/09/2026