Multi-agent deep reinforcement learning: agents predicting other agents

Lead Research Organisation: University of Warwick

Department Name: Statistics

Abstract

An important area of research in modern Artificial Intelligence is the development of autonomous agents that can interact effectively with other agents. This project is concerned with the development of multi-agent deep reinforcement learning (MADRL) algorithms for collaborative sequential decision making. MADRL holds the promise of enabling autonomous systems to learn large repertoires of collaborative and adaptive behavioural skills without human intervention, with application in a range of settings from autonomous driving to industrial process automation to modelling human learning and cognition. An important aspect of such agents in the ability to reason about the behaviours, goals and beliefs of the other agents, which can be used to inform their decision making. In this project, we will develop novel MADRL approaches providing each learning agent the ability to reason about other agents' behaviour through probabilistic modelling. In the current body of work, a common assumption is that the modelling agent is provided with the local trajectories of the modelled agents, i.e. their observations of the environmental state and their past actions. Recent developments have started to explore whether effective agent modelling can be achieved using only locally available information of the controlled agent during execution. So far, this has only been achieved through centralized training, which may be unrealistic in many real-world scenarios like intelligent transportation systems. Instead, we will consider fully decentralized learning algorithms. Specifically, we will assume that agents are modelled as nodes of a time-varying communication network in the absence of a central controller; furthermore, we will assume that agents only have partial observability and can communicate with nearby nodes. We will explore both closed and open systems, i.e., situations where the agents are free to leave and join the system at any time as needed for the task at hand. Several modelling techniques will be explored, including non-parametric approaches based on recurrent neural networks and parametric models from signal processing such as decentralised Kalman filters. The resulting algorithms will be tested on several simulated multi-agent environments that mimic real-world use cases.

Student:

Ting Zhu

Period of Study:

Oct 21 - Sep 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2585631

Research Topic:

Unclassified

Organisations

University of Warwick (Lead Research Organisation)

People	ORCID iD
Giovanni Montana (Primary Supervisor)	http://orcid.org/0000-0003-3942-3900
Ting Zhu (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/W523793/1			30/09/2021	29/09/2025
2585631	Studentship	EP/W523793/1	03/10/2021	29/09/2025	Ting Zhu

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects