Reinforcement Learning for optimal treatment strategies in healthcare applications

Lead Research Organisation: University of Warwick
Department Name: Statistics


An optimal stopping problem aims at finding an optimal policy that describes the right time at which to take a particular action in a stochastic process, to maximize an expected reward.

Here the focus is placed on applications in the healthcare sector and predictive models characterized by sequential decision-making settings.

The aim is to investigate the use of RL algorithms to learn optimal stopping policies suitable to describe a sequential decision-making setting, typical in a medical intervention study. The problem of interest regards a prognostic model attempting to reduce the risk of an adverse outcome of a group of patients given a set of covariates. Similar use can be found determining when to stop the treatment of patients receiving fractionated radiotherapy treatments, but also in response-guided problems of pharmacological treatments.

The stopping problem is introduced by the role of interventions (actions) taken by the medical practitioner (agent) that create a causal link affecting the predictions (environment), to reduce the patient's risk. In real-world settings it is also observed that interventions driven by the score can change the distribution of the data and outcomes, leading to a decay in observed performance, particularly if the intervention is successful. As a result, this requires learning an optimal policy to address the stopping problem and dealing with the causal process governing the 'intervened' covariate and the outcome.

The challenges and novelties by the problem regard strategies to incorporate stochastic intervention functions by means of Gaussian Processes as well as constrained policy optimization to take into account of real-world approximation of constraints (e.g. resource allocation in a hospital facility) referred to as costs of opening, running, and closing the activities.

Further developments and novelties can be sought in extending the investigation to a multi agent framework to model the collaborative/competitive relationship behaviour among, for example, managing of resources across hospitals.

Ajdari, A., Niyazi, M., Nicolay, N.H., Thieke, C., Jeraj, R. and Bortfeld, T., 2019. Towards optimal stopping in radiation therapy. Radiotherapy and Oncology, 134, pp.96-100.

Kotas, J., 2019. Optimal stopping for response-guided dosing. Networks & Heterogeneous Media, 14(1), p.43.

Lenert, M.C., Matheny, M.E. and Walsh, C.G., 2019. Prognostic models will be victims of their own success, unless.... Journal of the American Medical Informatics Association, 26(12), pp.1645-1650.

Deliu, N., Williams, J.J. and Chakraborty, B., 2022. Reinforcement Learning in Modern Biostatistics: Constructing Optimal Adaptive Interventions. arXiv preprint arXiv:2203.02605.

Wu, S.A., Wang, R.E., Evans, J.A., Tenenbaum, J.B., Parkes, D.C. and Kleiman-Weiner, M., 2021. Too Many Cooks: Bayesian Inference for Coordinating Multi-Agent Collaboration. Topics in Cognitive Science, 13(2), pp.414-432.Titsias, M.K., Schwarz, J., Matthews, A.G.D.G., Pascanu, R. and Teh, Y.W., 2019. Functional regularisation for continual learning with gaussian processes. arXiv preprint arXiv:1901.11356.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/V520226/1 30/09/2020 31/10/2025
2440893 Studentship EP/V520226/1 04/10/2020 06/10/2025 Claudia Viaro