Reinforcement Learning to Increase Machine Learning Speed

Lead Research Organisation: University of York

Department Name: Computer Science

Abstract

Reinforcement learning is an agent based approach to Machine Learning in which agents are allowed to explore state spaces and are equipped with reward functions to measure their performance. The agents then make alterations to their approach to the problem in accordance with their previous performance.

This project considers reinforcement learning and how the reward functions can be shaped in order to increase learning speed. This project will also consider how making these reward functions dynamic can represent domain knowledge at the problem at hand and how best to utilise this.

Student:

John Burden

Period of Study:

Sep 17 - Sep 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1949684

Research Topic:

Unclassified

Organisations

University of York (Lead Research Organisation)

People	ORCID iD
John Burden (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509802/1			30/09/2016	30/03/2022
1949684	Studentship	EP/N509802/1	30/09/2017	29/09/2020	John Burden

Key Findings


Description	Reinforcement Learning is a machine learning paradigm in which a an agent interacts with the environment it is in and alters its behaviour based on a reward signal that it receives --- as a general concept it is not too disimilar to training a pet, rewarding them with treats for good behaviour. Reinforcement Learning struggles with large scale tasks such as playing Video Games from just visual stimuli - this is difficult because the agent is only seeing a mass of pixels and has to learn what objects and entitites the pixels correspond to, then the agent has to further learn how to manipulate these entities to maximise the reward it receives. A previously known technique known as Reward Shaping has previously shown to speed up the time taken for a Reinforcement Learning Agent to learn to perform well. However this technique often requires an expert in the task to painstakingly encode how "good" each possible interaction witht he environment is. For large scale environments this is simply intractable. Other previously existing techniques more recently have found that by creating an abstraction of the original task, the abstraction can then be solved more easily and then used to employ the Reward Shaping technique effectively. Creating this abstraction however, often still requires an expert in the task. The initial key finding for this award has been a simple technique to create these abstractions automatically based on the agent's own initial observations. The key insight is to divide up the original problem's states into so-called abstract states of a uniform size --- think of this like creating a Draughtboard. The agent's observations can then be used to fill in the remaining details about the abstraction, requiring an expert only to decide what draught-square the solution is in. We augmnented a welll known Reinforcement Learning algorithm - Deep Q-Networks - with this idea --- the agent was able to create its own abstraction and then solve it and employ for Reward Shaping to speed up its own learning process. Our approach was then able to outperform the Deep Q-Networks algorithm without the Reward Shaping on a variety of benchmark tests.
Exploitation Route	I am still working on the technique and hope to expand it to larger and more complex tasks. After this, it will hopefully serve as a general speed-up to many Reinforcement Learning applications which are currently beginning to be employed in industry more and more.
Sectors	Aerospace Defence and Marine Agriculture Food and Drink Chemicals Construction Digital/Communication/Information Technologies (including Software) Electronics Energy Environment Financial Services and Management Consultancy Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology Retail Security and Diplomacy Transport Other

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects