Improving Sample Efficiency of Deep Reinforcement Learning

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

This project falls within the EPSRC Artificial Intelligence Technologies research area.

Autonomous goal-oriented learning through the interaction with the world has been one of the main objectives of the field of artificial intelligence. Mathematically, this problem has been formalized through the framework of reinforcement learning (RL), expressing it as finding a policy for the agent interacting with its environment, which will lead to the highest overall reward. The significant challenge there is reasoning about the long-term consequences of actions, striking the right trade-off between short-term and long-term rewards. Reinforcement learning framework hence naturally describes problems encountered in robotics, economics, operations research and elsewhere.

In order to scale up RL algorithms, they have been combined with the powerful representation learning methods of deep learning in the last few years, resulting in the steeply growing field of deep reinforcement learning (DRL). DRL's most famous success is AlphaGo--a program which beats human champions in the ancient board game Go, and DRL has been successfully applied to problems like managing power consumption and neural architecture search.

One of the main obstacles still to a more widespread application of DRL is its notoriously high demand for data: DRL agents often require 100-1000 times more experience than a human to reach the same level of performance. This is particularly limiting in domains where interaction with the environment and acquiring experience is a slow and costly process. Such is the case with robotics, and many other problems involving interaction with a physical system that can not be easily simulated on a computer.

The aim of our project will be to investigate a number of promising algorithmic approaches for increasing the sample efficiency of DRL, leading to significant improvements in the applicability of RL. Particularly promising approaches include and combine: (i) application of auxiliary losses and unsupervised learning to RL, (ii) transfer and meta-learning, (iii) few-shot imitation learning and (iv) model-based RL. In many of these areas, our group has already published at top conferences in the field.

The use of auxiliary losses in RL enables extracting more information from the available interaction with the environment, with the auxiliary losses coming from the measures of surprisal, empowerment, and ability to predict rewards or state transitions. Towards the development of better methods for transfer, we are interested in using structure available in natural language to provide additional information, building upon the existing work on policy sketches. Continuing our previous work on infinitely differentiable Monte Carlo estimators, we will investigate potential uses of higher order gradients in meta-learning, leading to the faster transfer of skills across tasks and reducing the data demand of DRL.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
1896649 Studentship EP/N509711/1 01/01/2018 31/12/2020 Jelena Luketina
 
Description A lecture at University College London 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The lecture was given as a part of the ongoing course on Statistical Natural Language Processing at UCL, introducing the students to the main ideas and recent developments in the research examining the role of natural language in reinforcement learning. Several students in the audience expressed interested in learning more about the topic, potentially initiating a project in a related area.
Year(s) Of Engagement Activity 2020
URL https://moodle.ucl.ac.uk/course/info.php?id=1441
 
Description A talk at Oxford Disruptive Tech Week 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A lecture introducing MBA students to the strengths and limitations of the current machine learning methods.
The goal was to give an insider/expert perspective, to be able to see past the hype and understand when these systems can be used reliably.
Year(s) Of Engagement Activity 2018
URL https://www.odtw.co.uk/speakers
 
Description Seminar at Future of Humanity Institute 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A seminar targeted primarily at the Research Scholars Programme (RSP) at the Future of Humanity Institute, with the goal of informing them on the most important developments in the field of deep learning in the last 2 years. The researchers at RSP are working on a variety of topics related to the risks and societal impact of emerging technologies, including AI policy, hence they found the presentation and the discussion afterward very helpful.
Year(s) Of Engagement Activity 2019
URL https://www.fhi.ox.ac.uk/rsp/