Towards interactive explanatory reinforcement learning for aligned and trustworthy agents

Lead Research Organisation: University of Bristol
Department Name: Engineering Mathematics and Technology

Abstract

Deep reinforcement learning (RL) is a compelling solution to complex control problems, but a lack of transparency hampers trust, understanding and safety validation. If RL is to power future autonomous systems, the development of interpretability tools to "open the black box" must become a rigorous science. My research adapts explainable AI (XAI) methods to analyse the behaviour of deep RL agents. In my first year I have used decision trees to "clone" multiagent traffic controllers, revealing the latent factors influencing their outputs. I have since developed a novel tree model for jointly representing the policy, value function and temporal dynamics of a lunar lander, facilitating interactive visualisation and query-answering. While the closed-loop nature of control makes explanation more complex than in supervised learning, the end result may yet be more intuitive, by leveraging human's capacity to adopt Dennett's intentional stance with respect to agents.

XAI researchers should always have a target user for their work. In the coming months I intend to focus on one particular user: the designer of the RL agent itself, who is responsible for defining model parameters, and crucially specifying the reward function that drives learning. I plan to use XAI to facilitate interactive RL, in which deeper causal understanding of reward functions and learning dynamics enables the more principled iteration of training, tuning and reward modification, replacing today's trial-and-error approach. This application of XAI addresses the philosophical problem of alignment, which stresses the paramount importance of correct goal specification in the context of increasingly powerful generic optimisers.

This research falls within the EPSRC Artificial Intelligence Technologies research area, and also has connections to Human-computer Interaction.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/P510427/1 01/10/2016 31/12/2021
2314554 Studentship EP/P510427/1 01/10/2019 30/09/2023 Thomas Bewley