Reinforcement Learning to Increase Machine Learning Speed

Lead Research Organisation: University of York
Department Name: Computer Science

Abstract

Reinforcement learning is an agent based approach to Machine Learning in which agents are allowed to explore state spaces and are equipped with reward functions to measure their performance. The agents then make alterations to their approach to the problem in accordance with their previous performance.

This project considers reinforcement learning and how the reward functions can be shaped in order to increase learning speed. This project will also consider how making these reward functions dynamic can represent domain knowledge at the problem at hand and how best to utilise this.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509802/1 01/10/2016 31/03/2022
1949684 Studentship EP/N509802/1 01/10/2017 30/09/2020 John Burden
 
Description Reinforcement Learning is a machine learning paradigm in which a an agent interacts with the environment it is in and alters its behaviour based on a reward signal that it receives --- as a general concept it is not too disimilar to training a pet, rewarding them with treats for good behaviour.
Reinforcement Learning struggles with large scale tasks such as playing Video Games from just visual stimuli - this is difficult because the agent is only seeing a mass of pixels and has to learn what objects and entitites the pixels correspond to, then the agent has to further learn how to manipulate these entities to maximise the reward it receives.

A previously known technique known as Reward Shaping has previously shown to speed up the time taken for a Reinforcement Learning Agent to learn to perform well. However this technique often requires an expert in the task to painstakingly encode how "good" each possible interaction witht he environment is. For large scale environments this is simply intractable.

Other previously existing techniques more recently have found that by creating an abstraction of the original task, the abstraction can then be solved more easily and then used to employ the Reward Shaping technique effectively. Creating this abstraction however, often still requires an expert in the task.

The initial key finding for this award has been a simple technique to create these abstractions automatically based on the agent's own initial observations. The key insight is to divide up the original problem's states into so-called abstract states of a uniform size --- think of this like creating a Draughtboard. The agent's observations can then be used to fill in the remaining details about the abstraction, requiring an expert only to decide what draught-square the solution is in.
We augmnented a welll known Reinforcement Learning algorithm - Deep Q-Networks - with this idea --- the agent was able to create its own abstraction and then solve it and employ for Reward Shaping to speed up its own learning process. Our approach was then able to outperform the Deep Q-Networks algorithm without the Reward Shaping on a variety of benchmark tests.
Exploitation Route I am still working on the technique and hope to expand it to larger and more complex tasks. After this, it will hopefully serve as a general speed-up to many Reinforcement Learning applications which are currently beginning to be employed in industry more and more.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Construction,Digital/Communication/Information Technologies (including Software),Electronics,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Retail,Security and Diplomacy,Transport,Other