Devising robust Multi-Armed Bandit algorithms in the presence of non-stationarities and long-range dependencies

Lead Research Organisation: Lancaster University
Department Name: Mathematics and Statistics

Abstract

The Multi-Armed Bandit (MAB) problem is one of the most central instances of sequential decision making under uncertainty, which plays a key role in online learning and optimization. MABs arise in a variety of modern real-world applications, such as online advertisement, Internet routing, and sequential portfolio selection, only to name a few. In this problem, a forecaster aims to maximize the expected sum of the rewards actively collected from unknown processes. MABs are typically studied under the assumption that the rewards are i.i.d.. However, this assumption does not necessarily hold in many practical situations. The objective of this project is to analyze the possibilities and limitations of more challenging, yet more realistic (restless) MAB settings, where the reward distributions may exhibit long-range dependencies and may possess potential non-stationarities. As part of the project, novel MAB strategies with good performance guarantees will be sought, and applications to real-world problems will be explored.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T518037/1 01/10/2020 30/09/2025
2437073 Studentship EP/T518037/1 01/10/2020 30/09/2024 Ali Arabzadeh