Stochastic optimal control constrained by costly observations

Lead Research Organisation: University of Oxford

Abstract

Stochastic control is an important tool that incorporates the effect of randomness in decision-making. It is useful as a mathematical model in applications and is often employed in areas of mathematical finance and resource management. The main aim of our project is the mathematical analysis of variants of stochastic control that involves costly observations, through a combination of analytical, numerical and computational approaches. Our model assumes that access to the underlying state requires a strictly positive cost. This amounts to an optimisation problem that involves a trade off between cost and information. Although this falls under the partial information setting, this differs from the filtering problem, where a function of the underlying state is accessible to the user at all times. The trade off between cost and information can be commonly seen in instances of maintenance, environmental restoration and hospital treatment problems, where measurements or records are expensive and are not continuously observed. It can also be seen as a variant of exploration versus exploitation, a theme which is prevalent in research areas of bandit theory and reinforcement learning.

We find the amount of literature exploring the above aspects limited, and the mathematical framework has yet to be established in full generality. In our work we first consider a discrete-time Markov chain model as a starting point and provide a simple analysis on a toy problem for intuition. The Markov chain construction provides us with a possible discretisation scheme for numerical approximation in the continuous case. We then move on to the general continuous-time setting, deriving a variational PDE for the value function via the dynamic programming principle. This approach is in analogy to the classical full information case and we hope to draw parallels between the two cases. In particular we expect the variational PDE to converge towards the classical HJB equation as the observation cost tends towards zero and we would like to establish this rate of convergence.

Due to the positive observation cost, extra integral terms are present in the variational PDE which adds a layer of complexity when numerically solving for the PDE. The search of a computationally efficient numerical scheme will be one of the focuses of the project in the future. In particular the use of neural networks could provide a potential pathway to overcoming the curse of dimensionality. Further avenues of investigation could involve analysis of additional constraints in the model, for example the incorporation of time-delayed executions.

This project falls within the EPSRC Mathematics Analysis, Numerical Analysis, Statistics and Applied Probability, and Operational Research research areas.

Planned Impact

Probabilistic modelling permeates the Financial services, healthcare, technology and other Service industries crucial to the UK's continuing social and economic prosperity, which are major users of stochastic algorithms for data analysis, simulation, systems design and optimisation. There is a major and growing skills shortage of experts in this area, and the success of the UK in addressing this shortage in cross-disciplinary research and industry expertise in computing, analytics and finance will directly impact the international competitiveness of UK companies and the quality of services delivered by government institutions.
By training highly skilled experts equipped to build, analyse and deploy probabilistic models, the CDT in Mathematics of Random Systems will contribute to
- sharpening the UK's research lead in this area and
- meeting the needs of industry across the technology, finance, government and healthcare sectors

MATHEMATICS, THEORETICAL PHYSICS and MATHEMATICAL BIOLOGY

The explosion of novel research areas in stochastic analysis requires the training of young researchers capable of facing the new scientific challenges and maintaining the UK's lead in this area. The partners are at the forefront of many recent developments and ideally positioned to successfully train the next generation of UK scientists for tackling these exciting challenges.
The theory of regularity structures, pioneered by Hairer (Imperial), has generated a ground-breaking approach to singular stochastic partial differential equations (SPDEs) and opened the way to solve longstanding problems in physics of random interface growth and quantum field theory, spearheaded by Hairer's group at Imperial. The theory of rough paths, initiated by TJ Lyons (Oxford), is undergoing a renewal spurred by applications in Data Science and systems control, led by the Oxford group in conjunction with Cass (Imperial). Pathwise methods and infinite dimensional methods in stochastic analysis with applications to robust modelling in finance and control have been developed by both groups.
Applications of probabilistic modelling in population genetics, mathematical ecology and precision healthcare, are active areas in which our groups have recognized expertise.

FINANCIAL SERVICES and GOVERNMENT

The large-scale computerisation of financial markets and retail finance and the advent of massive financial data sets are radically changing the landscape of financial services, requiring new profiles of experts with strong analytical and computing skills as well as familiarity with Big Data analysis and data-driven modelling, not matched by current MSc and PhD programs. Financial regulators (Bank of England, FCA, ECB) are investing in analytics and modelling to face this challenge. We will develop a novel training and research agenda adapted to these needs by leveraging the considerable expertise of our teams in quantitative modelling in finance and our extensive experience in partnerships with the financial institutions and regulators.

DATA SCIENCE:

Probabilistic algorithms, such as Stochastic gradient descent and Monte Carlo Tree Search, underlie the impressive achievements of Deep Learning methods. Stochastic control provides the theoretical framework for understanding and designing Reinforcement Learning algorithms. Deeper understanding of these algorithms can pave the way to designing improved algorithms with higher predictability and 'explainable' results, crucial for applications.
We will train experts who can blend a deeper understanding of algorithms with knowledge of the application at hand to go beyond pure data analysis and develop data-driven models and decision aid tools
There is a high demand for such expertise in technology, healthcare and finance sectors and great enthusiasm from our industry partners. Knowledge transfer will be enhanced through internships, co-funded studentships and paths to entrepreneurs

Student:

Jonathan Tam

Period of Study:

Oct 19 - Sep 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2269738

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Christoph Reisinger (Primary Supervisor)	http://orcid.org/0000-0003-4027-5298
Harald Oberhauser (Primary Supervisor)
Jonathan Tam (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S023925/1			01/04/2019	30/09/2027
2269738	Studentship	EP/S023925/1	01/10/2019	30/09/2023	Jonathan Tam