Epistemic Uncertainty Estimation in Multi-Agent Reinforcement Learning

Lead Research Organisation: University of Oxford

Department Name: Statistics

Abstract

This project falls in the EPSRC artificial intelligence (AI) and robotics research area.
Reinforcement Learning (RL) is a technique to train an AI agent through a system of rewards. The agent is allowed to interact and execute actions in an environment, and rewards are awarded when the agent successfully completes the intended task.
RL has a multitude of applications, some examples include robotics, recommendations systems and healthcare.
Multi-Agent Reinforcement Learning (MARL) focuses on how multiple agents interact with each other in a common environment.
Each agent is motivated by their own rewards and interest, but agents can collaborate to achieve common goals or compete with each other, resulting in complex group dynamics.
The study of MARL is increasingly more relevant, as AI agents become widespread in many aspects of our daily lives (e.g. self-driving cars).

Aims and Objectives

In real-world scenarios, it is common for agents to not have perfect knowledge of the world around them, and modelling uncertainty is fundamental to avoid catastrophic and dangerous failures. This means, the agent should know what it does not know.
The goal of this project is providing an accurate estimate of epistemic uncertainty in the MARL setting.
Epistemic uncertainty refers to uncertainty caused by a lack of knowledge. In the MARL scenario, this includes information about the environment or other agents' motivations and behaviour. This type of uncertainty can be reduced by taking actions to explore the environment or interact with other agents.
Obtaining a correct and calibrated uncertainty estimate could lead to safer interactions and collaboration between agents. Crucially, this includes interactions between AI agents and humans.
A relevant application of this would be self-driving cars interacting with human drivers or other self-driving cars using different software. Modelling uncertainty over all other agents' behaviours, regardless of what software they run or what their intentions are, is fundamental for effective and safe collaboration.

Novelty of the research methodology

Current RL techniques are incredibly successful, but fail to model uncertainty. In contrast, Bayesian models offer a theoretically grounded framework to reason about model uncertainty, but are often impossible to use in all but the simplest environments, due to their extremely high computational costs. Recently, multiple techniques have been proposed to circumvent this challenge and approximate Bayesian inference, such as dropout in Neural Networks (Gal, 2016) and Deep Ensembles (Lakshminarayanan, 2017).

Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. PMLR, 2016.

Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. "Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems 30 (2017).

Student:

Silvia Sapora

Period of Study:

Sep 22 - Sep 26

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2747642

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Yee Teh (Primary Supervisor)
Silvia Sapora (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/T517811/1			30/09/2020	29/09/2025
2747642	Studentship	EP/T517811/1	30/09/2022	29/09/2026	Silvia Sapora
EP/W524311/1			30/09/2022	29/09/2028
2747642	Studentship	EP/W524311/1	30/09/2022	29/09/2026	Silvia Sapora

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects