Reinforcement Learning in Large Complex Partially Observable Environments

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

This project falls into the EPSRC Research Area: Artificial Intelligence Technologies EPSRC Research Theme: Information and Communication Technologies

This research project is an exploration of the use of Reinforcement Learning for achieving a sophisticated level of control in large partially observable environments which exhibit complex dynamics and long-term dependencies.

Reinforcement Learning (RL) is a branch of Machine Learning that deals with how to act in an environment in order to maximise some notion of cumulative reward. In order to accomplish this, RL agents must carefully balance their exploration and exploitation of said environment, which is a difficult task in large complex environments. In recent years there has been much progress made on applying model-free approaches to such environments with much success. Most notably, approaches involving Deep Q Networks have been able to play a range of Atari games with superhuman performance.

We wish to continue this line of research and further investigate the use of Deep Q Networks and their many extensions to environments which require long-term planning. Specifically, we aim to produce an agent that can learn how to play a real time strategy game. In order to be able to accomplish such a goal, an agent must be adapt at many complex tasks. In addition to learning the consequences of its actions, an agent must learn to formulate a long-term goal to build towards, and also learn how to react to changes in its environment. Even humans struggle to play Real Time Strategy games without some prior training or guidance, which highlights the complexity of the problem. It is our belief that pursuing a complex problem such as this would lead to the development of useful ideas and techniques that would be applicable in a multitude of other areas.

In order to tackle this problem we will make use of ideas from Hierarchical Reinforcement Learning. We strongly believe that decomposing a problem into simpler sub-problems is a crucial part of being able to tackle complex environments, since the larger problem is often intractable whereas the simpler sub-problems are significantly easier to solve. In addition we will make use of recent advances in Machine Learning, specifically Deep Learning, in order to further refine our internal representation of the environment. An accurate representation of the environment is crucial in order to be able to intelligently act in partially observable domains, especially in the case of Real Time Strategy games where we must also learn to predict our opponent's behaviour.

Student:

Tabish Rashid

Period of Study:

Oct 16 - Mar 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1749045

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Shimon Whiteson (Primary Supervisor)
Tabish Rashid (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Mahajan A (2019) MAVEN: Multi-Agent Variational Exploration

Rashid T (2020) Optimistic Exploration even with a Pessimistic Initialisation

Rashid Tabish (2018) QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning in arXiv e-prints

Samvelyan Mikayel (2019) The StarCraft Multi-Agent Challenge in arXiv e-prints

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509711/1			01/10/2016	30/09/2021
1749045	Studentship	EP/N509711/1	03/10/2016	31/03/2020	Tabish Rashid

Key Findings
Software and Technical Products


Description	We have primarily been investigating in the setting of Multi-Agent Reinforcement Learning, in which a team of agents must learn how to work together to accomplish a shared goal. We have improved upon the recently proposed methods in this domain to better take advantage of any useful information that is available during training, and also show large performance gains over other SOTA methods that also take advantage of this extra information. We have also investigated methods for improved exploration in a single-agent setting with a sparse reward signal. We have found that optimism, particularly when deciding what action to take is extremely important and can significantly improve over the current methods.
Exploitation Route	A large contribution of ours in this domain is the release of a standardised benchmark that future researchers can use to facilitate a better comparison of methods in the Multi-Agent setting. We have also released our framework for running and experimenting with this benchmark, which also provides implementations of SOTA algorithms in this domain for others to utilise and build upon. Our findings with regards to exploration in sparse reward environments show the benefits of optimism during action selection, which is applicable to a wide variety of deep reinforcement learning algorithms.
Sectors	Other
URL	https://github.com/oxwhirl/smac


Title	StarCraft Multi-Agent Challenge
Description	SMAC is an environment for research in the field of collaborative multi-agent reinforcement learning (MARL) based on Blizzard's StarCraft II RTS game. SMAC provides a standardised benchmark to allows a better comparison of algorithms and methods in this field. SMAC provides a convenient interface for autonomous agents to interact with StarCraft II, getting observations and performing actions. It concentrates on decentralised micromanamgent scenarios, where each unit of the game is controlled by an individual RL agent. In addition, we have open-sourced our framework PyMARL, that contains implementations of several SOTA algorithms that can be run on SMAC.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	We have released a pre-print that benchmarks several SOTA algorithms on our domain for the first time. We have also released our framework for running those experimenting which would allow other researchers to easily start experimenting with the platform and build upon the algorithms we have implemented.
URL	http://whirl.cs.ox.ac.uk/blog/smac

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects