Learning Flexibility: Deep Transfer Reinforcement Learning

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Reinforcement learning is an a powerful form of learning that is inspired by the dopamine controlled processes employed by many animals to learn rewarding behaviours in novel environments[1]. Despite successfully finding optimal solutions to simple Markov decision processes, reinforcement learning's potential over complex tasks is only realised when combined with deep neural network architectures; such systems can approach or beat human-level performance when used in specific contexts[2][3].
These improvements in performance are as a consequence of allowing an agent to represent information hierarchically with varying degrees of abstraction in ways that are similar to techniques used in computer vision[4]. How these agents achieve this is a largely undiscovered part of deep reinforcement learning. In this project we propose to address this problem by exploring structures of neural networks that are trained to perform over similar tasks. We hope to identify underlying and higher level neural structures that are common to these tasks, and to explore the possibility of transferring them to novel, but similar tasks. We anticipate that these underlying structures will also yield the essence of each type of problem and will be a great aid in classification of tasks, as well as a step towards human-level flexibility.
This project falls within the EPSRC ICT research area.

[1] Sutton RS, Barto AG. Reinforcement Learning: An introduction. MIT Press 1998
[2] Tesauro G (1995). 'Temporal Difference Learning and TD-Gammon'. Communications of the ACM. 38 (3). P58-68.
[3] Mnih V, Kavukcuoglu K, Silver D et al. (2015). 'Human-level Control through Deep Reinforcement Learning'. Nature 518. P529-533
[4] Zeiler MD, Fergus R (2014). 'Visualizing and Understanding Convolutional Networks'. Lecture Notes in Computer Science 8689. P818-833

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
1735385 Studentship EP/N509711/1 03/10/2016 31/03/2020 Matthew George Fellows
 
Description Several existing reinforcement learning algorithms have been consolidated into a single framework with improved theoretical properties. Most importantly, the work has demonstrated that the optimal policy that these algorithms attempt to recover is the true optimal policy for the reinforcement learning objective, a result that has been missing in several empircially successful algorithms such as Maximum a Posteriori policy evalution. For widely used algorithms based on the maximum entropy pricinple (such as soft actor-critic that don't fit into our framework, we have provided a theoretical demonstration that these algorithms may never recover an optimal policy. Moreover, we have provided evidence that algorithms from our framework have similar performace or even outperform those algorithms derived from the maximum entropy principle.
Exploitation Route Our framework provides state of the art performance in reinforcement learning control environments with the added benefit of theoretical gaurentees, allowing others to use our framework in any reinforcement learning setting. Going forward, we are investigating the convergence properties of these algorithms.
Sectors Digital/Communication/Information Technologies (including Software),Financial Services, and Management Consultancy,Healthcare,Transport