Reinforcement Learning in Dynamic Treatment Regimes: Dealing with scarce data, safe exploration, and explainability.

Lead Research Organisation: University of Warwick

Department Name: Computer Science

Abstract

In light of the rapid increase in accessible clinical data and the growing interest in personalized medicine among clinical scientist communities, there is an unprecedented opportunity to improve the quality of Dynamic Treatment Regimes (DTRs) using available clinical data. This is especially true in the case of chronic diseases where treatment must adapt to the ever-evolving illness and unique response of each patient, and data-driven methods prove to be an extremely helpful tool to enhance dynamic treatment. DTRs generalizes personalized treatment as a time-varying treatment setting, in which treatment in each stage is tailored suitably based on historical and up-to-date clinical information of each patient. The goal of DTRs is to improve the patient outcome according to adaptive treatment, offering invaluable assistance to the clinical decision support systems, which lie at the heart of the chronic care model.
DTRs can be formulated as a sequential decision-making problem with a time-varying or dynamic state, in which a decision rule in each state depends on the treatment history and latest information of the patient. It provides a powerful tool to deal with the chronic and personal condition of each patient. Since DTRs can be considered as a sequential decision-making problem, Reinforcement learning (RL) is one of the most appropriate methods to deal with these problems. Recently, RL techniques such as Q-learning have been studied in the literature of DTRs [1] and yielded promising results. However, strict requirements such as safety and interpretability are still major challenges when applying RL in DTRs and healthcare in general.
In this project we aim to look at the following research questions:
1. Could learning from the treatment results from groups of patients be used to facilitate the training process of future treatment of a patient that has similar health conditions?
2. Could prior knowledge and insight of experts be integrated into the learning process of recommendation systems?
3. Could exploration be guided to guarantee both the safety of the treatment and the improvement of treatment strategies?
4. If the method includes function approximation, how could newly discovered treatment be interpreted and verified?
To answer these research questions, we rely on the novel combination of transfer learning, safe exploration, and interactive reinforcement learning.
Alignment with EPSRC research themes: It is very well aligned with Artificial Intelligence and Robotics. It is also related to the Healthcare Technologies theme. Finally, it is within the scope of Information and Communication Technologies theme.

Student:

Nam Tran

Period of Study:

Oct 21 - Sep 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2606309

Research Topic:

Unclassified

Organisations

University of Warwick (Lead Research Organisation)

People	ORCID iD
Long Tran-Thanh (Primary Supervisor)
Nam Tran (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/W523793/1			30/09/2021	29/09/2025
2606309	Studentship	EP/W523793/1	03/10/2021	29/09/2025	Nam Tran

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects