Reinforcement Learning in Large Complex Partially Observable Environments

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

This project falls into the EPSRC Research Area: Artificial Intelligence Technologies EPSRC Research Theme: Information and Communication Technologies

This research project is an exploration of the use of Reinforcement Learning for achieving a sophisticated level of control in large partially observable environments which exhibit complex dynamics and long-term dependencies.

Reinforcement Learning (RL) is a branch of Machine Learning that deals with how to act in an environment in order to maximise some notion of cumulative reward. In order to accomplish this, RL agents must carefully balance their exploration and exploitation of said environment, which is a difficult task in large complex environments. In recent years there has been much progress made on applying model-free approaches to such environments with much success. Most notably, approaches involving Deep Q Networks have been able to play a range of Atari games with superhuman performance.

We wish to continue this line of research and further investigate the use of Deep Q Networks and their many extensions to environments which require long-term planning. Specifically, we aim to produce an agent that can learn how to play a real time strategy game. In order to be able to accomplish such a goal, an agent must be adapt at many complex tasks. In addition to learning the consequences of its actions, an agent must learn to formulate a long-term goal to build towards, and also learn how to react to changes in its environment. Even humans struggle to play Real Time Strategy games without some prior training or guidance, which highlights the complexity of the problem. It is our belief that pursuing a complex problem such as this would lead to the development of useful ideas and techniques that would be applicable in a multitude of other areas.

In order to tackle this problem we will make use of ideas from Hierarchical Reinforcement Learning. We strongly believe that decomposing a problem into simpler sub-problems is a crucial part of being able to tackle complex environments, since the larger problem is often intractable whereas the simpler sub-problems are significantly easier to solve. In addition we will make use of recent advances in Machine Learning, specifically Deep Learning, in order to further refine our internal representation of the environment. An accurate representation of the environment is crucial in order to be able to intelligently act in partially observable domains, especially in the case of Real Time Strategy games where we must also learn to predict our opponent's behaviour.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
1749045 Studentship EP/N509711/1 03/10/2016 31/03/2020 Tabish Rashid
 
Description We have primarily been investigating in the setting of Multi-Agent Reinforcement Learning, in which a team of agents must learn how to work together to accomplish a shared goal. We have improved upon the recently proposed methods in this domain to better take advantage of any useful information that is available during training, and also show large performance gains over other SOTA methods that also take advantage of this extra information.
We have also investigated methods for improved exploration in a single-agent setting with a sparse reward signal. We have found that optimism, particularly when deciding what action to take is extremely important and can significantly improve over the current methods.
Exploitation Route A large contribution of ours in this domain is the release of a standardised benchmark that future researchers can use to facilitate a better comparison of methods in the Multi-Agent setting. We have also released our framework for running and experimenting with this benchmark, which also provides implementations of SOTA algorithms in this domain for others to utilise and build upon.
Our findings with regards to exploration in sparse reward environments show the benefits of optimism during action selection, which is applicable to a wide variety of deep reinforcement learning algorithms.
Sectors Other

URL https://github.com/oxwhirl/smac
 
Title StarCraft Multi-Agent Challenge 
Description SMAC is an environment for research in the field of collaborative multi-agent reinforcement learning (MARL) based on Blizzard's StarCraft II RTS game. SMAC provides a standardised benchmark to allows a better comparison of algorithms and methods in this field. SMAC provides a convenient interface for autonomous agents to interact with StarCraft II, getting observations and performing actions. It concentrates on decentralised micromanamgent scenarios, where each unit of the game is controlled by an individual RL agent. In addition, we have open-sourced our framework PyMARL, that contains implementations of several SOTA algorithms that can be run on SMAC. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact We have released a pre-print that benchmarks several SOTA algorithms on our domain for the first time. We have also released our framework for running those experimenting which would allow other researchers to easily start experimenting with the platform and build upon the algorithms we have implemented. 
URL http://whirl.cs.ox.ac.uk/blog/smac