Learning Flexibility: Deep Transfer Reinforcement Learning

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

Reinforcement learning is an a powerful form of learning that is inspired by the dopamine controlled processes employed by many animals to learn rewarding behaviours in novel environments[1]. Despite successfully finding optimal solutions to simple Markov decision processes, reinforcement learning's potential over complex tasks is only realised when combined with deep neural network architectures; such systems can approach or beat human-level performance when used in specific contexts[2][3].
These improvements in performance are as a consequence of allowing an agent to represent information hierarchically with varying degrees of abstraction in ways that are similar to techniques used in computer vision[4]. How these agents achieve this is a largely undiscovered part of deep reinforcement learning. In this project we propose to address this problem by exploring structures of neural networks that are trained to perform over similar tasks. We hope to identify underlying and higher level neural structures that are common to these tasks, and to explore the possibility of transferring them to novel, but similar tasks. We anticipate that these underlying structures will also yield the essence of each type of problem and will be a great aid in classification of tasks, as well as a step towards human-level flexibility.
This project falls within the EPSRC ICT research area.

[1] Sutton RS, Barto AG. Reinforcement Learning: An introduction. MIT Press 1998
[2] Tesauro G (1995). 'Temporal Difference Learning and TD-Gammon'. Communications of the ACM. 38 (3). P58-68.
[3] Mnih V, Kavukcuoglu K, Silver D et al. (2015). 'Human-level Control through Deep Reinforcement Learning'. Nature 518. P529-533
[4] Zeiler MD, Fergus R (2014). 'Visualizing and Understanding Convolutional Networks'. Lecture Notes in Computer Science 8689. P818-833

Student:

Matthew Fellows

Period of Study:

Oct 16 - Mar 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1735385

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Shimon Whiteson (Primary Supervisor)
Matthew Fellows (Student)

Publications

Author Name Title Publication Date Published

10 25 50

Fellows M (2018) Fourier Policy Gradients

Fellow M G (2019) VIREL: A Variational Inference Framework for Reinforcement Learning

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509711/1			30/09/2016	29/09/2021
1735385	Studentship	EP/N509711/1	02/10/2016	30/03/2020	Matthew Fellows

Key Findings


Description	Several existing reinforcement learning algorithms have been consolidated into a single framework with improved theoretical properties. Most importantly, the work has demonstrated that the optimal policy that these algorithms attempt to recover is the true optimal policy for the reinforcement learning objective, a result that has been missing in several empircially successful algorithms such as Maximum a Posteriori policy evalution. For widely used algorithms based on the maximum entropy pricinple (such as soft actor-critic that don't fit into our framework, we have provided a theoretical demonstration that these algorithms may never recover an optimal policy. Moreover, we have provided evidence that algorithms from our framework have similar performace or even outperform those algorithms derived from the maximum entropy principle.
Exploitation Route	Our framework provides state of the art performance in reinforcement learning control environments with the added benefit of theoretical gaurentees, allowing others to use our framework in any reinforcement learning setting. Going forward, we are investigating the convergence properties of these algorithms.
Sectors	Digital/Communication/Information Technologies (including Software) Financial Services and Management Consultancy Healthcare Transport

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects