An Adaptive Robust Dynamic Programming Approach for Decision Making under Model Uncertainty

Lead Research Organisation: University of Bath

Department Name: Mathematical Sciences

Abstract

In many real-world problems an agent must make decisions in an environment that is only partially known. By interacting with the world, the decision-maker is able to obtain more information about the system which allows for more educated choices in the future. Hence, a common characteristic of these problems is that the decision-maker can choose between decisions that lead to a fairly risk-free, high immediate reward, and more risky decisions which may be worse, but may provide the agent with previously unseen information about their environment. In the field of Reinforcement Learning this dilemma is commonly referred to as the "exploration-exploitation trade-off," and is an area of active research.

A fundamental challenge in understanding the exploration-exploitation trade-off is that one needs to measure the information gain "learned" by the agent, and to be able to understand how this information develops over time. Classically, this can be done in a Bayesian framework. However, the Bayesian framework requires an initial set of beliefs, and in practice, these may be imprecise. An alternative approach is to make decisions based on outcomes under worst-case scenarios, however this approach lacks the ability to account for learning.

In this project we aim to combine the best of both worlds by considering an adaptive (i.e. can incorporate learning), robust (i.e. accounting for uncertainty in the setup) framework for stochastic control problems featuring model uncertainty. Our starting point is the framework of Bielecki et al. (2017), who considered an adaptive, robust approach to a stochastic control problem related to an investment problem. We will attempt to apply their approach to the Newsvendor problem. The Newsvendor problem is a simple stochastic control problem that involves learning. In this problem, an agent (the newsvendor) must choose the number of newspapers to stock for the next period before observing the number of newspapers sold, and is encouraged to learn the distribution of the demand for newspapers, whilst minimising the cost due to unused stock, or unmet demand. As the current choice of stock will affect future outcomes due to differences in information about the number of sales observed, solving such problems requires understanding how the agent's beliefs will change in the future. We hope to construct approximation arguments based on the theory of Optimal Transport in order to reduce the complexity of the problem. Other possible aims of the project include generalising results that are currently known only in very special settings (e.g. from Y.-T. Chuang, 2019) which precisely quantify the surplus in stock used only for the sake of learning.

The interest in the Newsvendor model is primarily on account of its mathematical tractability, and the strong dependence of the information acquired on the decisions made by the agent. We expect the principles to be more widely applicable to many RL examples, and may thus contribute more broadly to future developments in Reinforcement Learning.

Planned Impact

Combining specialised modelling techniques with complex data analysis in order to deliver prediction with quantified uncertainties lies at the heart of many of the major challenges facing UK industry and society over the next decades. Indeed, the recent Government Office for Science report "Computational Modelling, Technological Futures, 2018" specifies putting the UK at the forefront of the data revolution as one of their Grand Challenges.

The beneficiaries of our research portfolio will include a wide range of UK industrial sectors such as the pharmaceutical industry, risk consultancy, telecommunications and advanced materials, as well as government bodies, including the NHS, the Met Office and the Environment Agency.

Examples of current impactful projects pursued by students and in collaboration with stake-holders include:

- Using machine learning techniques to develop automated assessment of psoriatic arthritis from hand X-Rays, freeing up consultants' time (with the NHS).

- Uncertainty quantification for the Neutron Transport Equation improving nuclear reactor safety (co-funded by Wood).

- Optimising the resilience and self-configuration of communication networks with the help of random graph colouring problems (co-funded by BT).

- Risk quantification of failure cascades on oil platforms by using Bayesian networks to improve safety assessment for certification (co-funded by DNV-GL).

- Krylov regularisation in a Bayesian framework for low-resolution Nuclear Magnetic Resonance to assess properties of porous media for real-time exploration (co-funded by Schlumberger).

- Machine learning methods to untangle oceanographic sound data for a variety of goals in including the protection of wildlife in shipping lanes (with the Department of Physics).

Future committed partners for SAMBa 2.0 are: BT, Syngenta, Schlumberger, DNV GL, Wood, ONS, AstraZeneca, Roche, Diamond Light Source, GKN, NHS, NPL, Environment Agency, Novartis, Cytel, Mango, Moogsoft, Willis Towers Watson.

SAMBa's core mission is to train the next generation of academic and industrial researchers with the breadth and depth of skills necessary to address these challenges. SAMBa's most sustained impact will be through the contributions these researchers make over the longer term of their careers. To set the students up with the skills needed to maximise this impact, SAMBa has developed a bespoke training experience in collaboration with industry, at the heart of its activities. Integrative Think Tanks (ITTs) are week-long workshops in which industrial partners present high-level research challenges to students and academics. All participants work collaboratively to formulate mathematical
models and questions that address the challenges. These outputs are meaningful both to the non-academic partner, and as a mechanism for identifying mathematical topics which are suitable for PhD research. Through the co-ownership of collaboratively developed projects, SAMBa has the capacity to lead industry in capitalising on recent advances in mathematics. ITTs occur twice a year and excel in the process of problem distillation and formulation, resulting in an exemplary environment for developing impactful projects.

SAMBa's impact on the student experience will be profound, with training in a broad range of mathematical areas, in team working, in academic-industrial collaborations, and in developing skills in communicating with specialist and generalist audiences about their research. Experience with current SAMBa students has proven that these skills are highly prized: "The SAMBa approach was a great template for setting up a productive, creative and collaborative atmosphere. The commitment of the students in getting involved with unfamiliar areas of research and applying their experience towards producing solutions was very impressive." - Dr Mike Marsh, Space weather researcher, Met Office.

Student:

Marcel STOZIR

Period of Study:

Oct 20 - Sep 24

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2440945

Research Topic:

Unclassified

Organisations

University of Bath (Lead Research Organisation)

People	ORCID iD
Daniel Kious (Primary Supervisor)
Hendrik Weber (Primary Supervisor)
Marcel STOZIR (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S022945/1			01/10/2019	31/03/2028
2440945	Studentship	EP/S022945/1	01/10/2020	30/09/2024	Marcel STOZIR