Multi-agent deep reinforcement learning: agents predicting other agents

Lead Research Organisation: University of Warwick
Department Name: Statistics

Abstract

An important area of research in modern Artificial Intelligence is the development of autonomous agents that can interact effectively with other agents. This project is concerned with the development of multi-agent deep reinforcement learning (MADRL) algorithms for collaborative sequential decision making. MADRL holds the promise of enabling autonomous systems to learn large repertoires of collaborative and adaptive behavioural skills without human intervention, with application in a range of settings from autonomous driving to industrial process automation to modelling human learning and cognition. An important aspect of such agents in the ability to reason about the behaviours, goals and beliefs of the other agents, which can be used to inform their decision making. In this project, we will develop novel MADRL approaches providing each learning agent the ability to reason about other agents' behaviour through probabilistic modelling. In the current body of work, a common assumption is that the modelling agent is provided with the local trajectories of the modelled agents, i.e. their observations of the environmental state and their past actions. Recent developments have started to explore whether effective agent modelling can be achieved using only locally available information of the controlled agent during execution. So far, this has only been achieved through centralized training, which may be unrealistic in many real-world scenarios like intelligent transportation systems. Instead, we will consider fully decentralized learning algorithms. Specifically, we will assume that agents are modelled as nodes of a time-varying communication network in the absence of a central controller; furthermore, we will assume that agents only have partial observability and can communicate with nearby nodes. We will explore both closed and open systems, i.e., situations where the agents are free to leave and join the system at any time as needed for the task at hand. Several modelling techniques will be explored, including non-parametric approaches based on recurrent neural networks and parametric models from signal processing such as decentralised Kalman filters. The resulting algorithms will be tested on several simulated multi-agent environments that mimic real-world use cases.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W523793/1 01/10/2021 30/09/2025
2585631 Studentship EP/W523793/1 04/10/2021 30/09/2025 Ting Zhu