Model-based Multi-agent Reinforcement Learning

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

This project falls within the EPSRC information and communication technologies (ICT) research area, particularly, artificial intelligence technologies.
The goal of this project is to design efficient model-based algorithms for multi-agent reinforcement learning. There is no any company or collaborator involved in this project.

Reinforcement learning is a sub-field of machine learning. Reinforcement learning studies how an agent interacts with an environment to maximize future rewards. A model in reinforcement learning refers to the dynamics of an environment and can either be given or be learned by an agent. Learning with a model is usually called model-based reinforcement learning. In contrast, learning without a model is usually referred to as model-free reinforcement learning. Classical reinforcement learning considers only one agent. This is often not enough to model many real problems, e.g., cars in a road. To this end, multi-agent reinforcement learning is proposed, where we consider many agents interacting with each other and a common environment. These agents can either be cooperative or competitive.

Model-based single-agent reinforcement learning has enjoyed great success in various challenging domains, for example, defeating the top human Go player (Silver et al. 2016). However, multi-agent reinforcement learning has not benefited too much from model-based learning. The objective of this project is to combine model-based reinforcement learning and multi-agent reinforcement learning. Particularly, this project aims to deliver both off-line planning algorithms and on-line planning algorithms for multi-agent reinforcement learning with a learned model. There has not been a successful combination of model- based planning and multi-agent reinforcement learning in challenging domains.

Many real problems can benefit a lot from this project, for example, traffic control, autonomous driving. This project will also have an essential impact on computer games, for example, StarCraft. The main difficulty of combining model-based reinforcement learning and multi-agent reinforcement learning comes from the uncertainty of the environment. I propose to deal with this uncertainty with both Bayesian methods and distributional reinforcement learning methods. Bayesian methods are in charge of characterizing parametric uncertainty (i.e., uncertainty of the estimation due to the lack of data), while distributional reinforcement learning methods are in charge of intrinsic uncertainty (i.e., uncertainty of the environment itself). I propose to make use of those two kinds of uncertainty in model-based planning. Parametric uncertainty can be used to improve the exploration of the agents, while intrinsic uncertainty can be used to avoid risk.

The project will start with a literature review of related works, especially for Bayesian reinforcement learning and distributional reinforcement learning, after which I will analyze the defects and merits of state-of-the-art algorithms in model-based reinforcement learning and multi-agent reinforcement learning. Then I will study how Bayesian reinforcement learning and distributional re-inforcement learning can be exploited to remedy the defects of state-of-the-art algorithms. Then I will implement my methods and benchmark them in both simulated environments (e.g., a game like StarCraft) and real-world reinforcement learning tasks. Finally, I will compare my methods with the state-of-the-art algorithms and report the results.

References
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche,
G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.
(2016). Mastering the game of go with deep neural networks and tree search.
Nature.

Student:

Shangtong Zhang

Period of Study:

Oct 18 - Mar 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2105085

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Shimon Whiteson (Primary Supervisor)
Shangtong Zhang (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509711/1			01/10/2016	30/09/2021
2105085	Studentship	EP/N509711/1	01/10/2018	31/03/2022	Shangtong Zhang
EP/R513295/1			01/10/2018	30/09/2023
2105085	Studentship	EP/R513295/1	01/10/2018	31/03/2022	Shangtong Zhang

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects