Reinforcement Learning for optimal treatment strategies in healthcare applications

Lead Research Organisation: University of Warwick

Department Name: Statistics

Abstract

An optimal stopping problem aims at finding an optimal policy that describes the right time at which to take a particular action in a stochastic process, to maximize an expected reward.

Here the focus is placed on applications in the healthcare sector and predictive models characterized by sequential decision-making settings.

The aim is to investigate the use of RL algorithms to learn optimal stopping policies suitable to describe a sequential decision-making setting, typical in a medical intervention study. The problem of interest regards a prognostic model attempting to reduce the risk of an adverse outcome of a group of patients given a set of covariates. Similar use can be found determining when to stop the treatment of patients receiving fractionated radiotherapy treatments, but also in response-guided problems of pharmacological treatments.

The stopping problem is introduced by the role of interventions (actions) taken by the medical practitioner (agent) that create a causal link affecting the predictions (environment), to reduce the patient's risk. In real-world settings it is also observed that interventions driven by the score can change the distribution of the data and outcomes, leading to a decay in observed performance, particularly if the intervention is successful. As a result, this requires learning an optimal policy to address the stopping problem and dealing with the causal process governing the 'intervened' covariate and the outcome.

The challenges and novelties by the problem regard strategies to incorporate stochastic intervention functions by means of Gaussian Processes as well as constrained policy optimization to take into account of real-world approximation of constraints (e.g. resource allocation in a hospital facility) referred to as costs of opening, running, and closing the activities.

Further developments and novelties can be sought in extending the investigation to a multi agent framework to model the collaborative/competitive relationship behaviour among, for example, managing of resources across hospitals.

Ajdari, A., Niyazi, M., Nicolay, N.H., Thieke, C., Jeraj, R. and Bortfeld, T., 2019. Towards optimal stopping in radiation therapy. Radiotherapy and Oncology, 134, pp.96-100.

Kotas, J., 2019. Optimal stopping for response-guided dosing. Networks & Heterogeneous Media, 14(1), p.43.

Lenert, M.C., Matheny, M.E. and Walsh, C.G., 2019. Prognostic models will be victims of their own success, unless.... Journal of the American Medical Informatics Association, 26(12), pp.1645-1650.

Deliu, N., Williams, J.J. and Chakraborty, B., 2022. Reinforcement Learning in Modern Biostatistics: Constructing Optimal Adaptive Interventions. arXiv preprint arXiv:2203.02605.

Wu, S.A., Wang, R.E., Evans, J.A., Tenenbaum, J.B., Parkes, D.C. and Kleiman-Weiner, M., 2021. Too Many Cooks: Bayesian Inference for Coordinating Multi-Agent Collaboration. Topics in Cognitive Science, 13(2), pp.414-432.Titsias, M.K., Schwarz, J., Matthews, A.G.D.G., Pascanu, R. and Teh, Y.W., 2019. Functional regularisation for continual learning with gaussian processes. arXiv preprint arXiv:1901.11356.

Student:

Claudia Viaro

Period of Study:

Oct 20 - Jun 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2440893

Research Topic:

Unclassified

Organisations

University of Warwick (Lead Research Organisation)

People	ORCID iD
Theodoros Damoulas (Primary Supervisor)
Claudia Viaro (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/V520226/1			30/09/2020	31/10/2025
2440893	Studentship	EP/V520226/1	04/10/2020	18/06/2023	Claudia Viaro

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects