Reinforcement Learning in Dynamic Treatment Regimes: Dealing with scarce data, safe exploration, and explainability.

Lead Research Organisation: University of Warwick
Department Name: Computer Science


In light of the rapid increase in accessible clinical data and the growing interest in personalized medicine among clinical scientist communities, there is an unprecedented opportunity to improve the quality of Dynamic Treatment Regimes (DTRs) using available clinical data. This is especially true in the case of chronic diseases where treatment must adapt to the ever-evolving illness and unique response of each patient, and data-driven methods prove to be an extremely helpful tool to enhance dynamic treatment. DTRs generalizes personalized treatment as a time-varying treatment setting, in which treatment in each stage is tailored suitably based on historical and up-to-date clinical information of each patient. The goal of DTRs is to improve the patient outcome according to adaptive treatment, offering invaluable assistance to the clinical decision support systems, which lie at the heart of the chronic care model.
DTRs can be formulated as a sequential decision-making problem with a time-varying or dynamic state, in which a decision rule in each state depends on the treatment history and latest information of the patient. It provides a powerful tool to deal with the chronic and personal condition of each patient. Since DTRs can be considered as a sequential decision-making problem, Reinforcement learning (RL) is one of the most appropriate methods to deal with these problems. Recently, RL techniques such as Q-learning have been studied in the literature of DTRs [1] and yielded promising results. However, strict requirements such as safety and interpretability are still major challenges when applying RL in DTRs and healthcare in general.
In this project we aim to look at the following research questions:
1. Could learning from the treatment results from groups of patients be used to facilitate the training process of future treatment of a patient that has similar health conditions?
2. Could prior knowledge and insight of experts be integrated into the learning process of recommendation systems?
3. Could exploration be guided to guarantee both the safety of the treatment and the improvement of treatment strategies?
4. If the method includes function approximation, how could newly discovered treatment be interpreted and verified?
To answer these research questions, we rely on the novel combination of transfer learning, safe exploration, and interactive reinforcement learning.
Alignment with EPSRC research themes: It is very well aligned with Artificial Intelligence and Robotics. It is also related to the Healthcare Technologies theme. Finally, it is within the scope of Information and Communication Technologies theme.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W523793/1 30/09/2021 29/09/2025
2606309 Studentship EP/W523793/1 03/10/2021 29/09/2025 Nam Tran