Near-Optimal Scalable Algorithms for Multi-Agent Reinforcement Learning

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Sequential decision making is an important setting of modern statistical theories and applications, where an agent sequentially interacts with an environment - observing its state, taking actions and receiving rewards - with the objective of maximizing the cumulative reward.
This class of methods has been successfully applied to games, finance, robotics, autonomous driving and computer vision. Many of these applications involve the participation of multiple agents, which brings a scalability issue with respect to the state and action spaces. Indeed, while many of the already established algorithms achieve good scaling guarantees for the single agent setting, they do not perform well if they are immediately applied to the case where multiple agents interact in the same environment. This is because the simplest approach would be to consider a state or an action as the joint states or actions of all the agents, causing the spaces to grow exponentially with respect to the number of agents. Our approach consists in dividing the set of agents into neighbourhoods in order to reduce the exponential scaling of the setting.
The aim of the project is to obtain a near-optimal algorithm for the multi-agent setting with a computational complexity that depends exponentially only on the cardinality of a neighbourhood.
This project falls within the EPSRC "Statistics an applied probability" and "Artificial intelligence technologies" research areas.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/V520202/1 01/10/2020 31/10/2025
2444539 Studentship EP/V520202/1 01/10/2020 30/09/2024 Carlo Alfano