CausalXRL: Causal eXplanations in Reinforcement Learning

Lead Research Organisation: University of Sheffield
Department Name: Computer Science

Abstract

Deep reinforcement learning (RL) systems are approaching or surpassing human-level performance in specific domains, from games to decision support to continuous control, albeit in non-critical environments, and usually learning via random explorations. Despite these prodigious achievements, many applications cannot be considered today because we need to understand and explain how these AI systems make their decisions before letting them interact with, and possibly impact on, human beings and society.

There are two main obstacles for AI agents to explain their decisions: they have to be able to provide it at a level human beings can understand, and they have to deal with causal relations rather than statistical correlations. Hence, we believe the key to explainable AI, in particular for decision support, is to build or learn causal models of the system being intervened upon. Thus, instead of standard machine learning and reinforcement learning networks, we will leverage the new science of causal inference to equip deep RL systems with the ability to learn, plan with, and justifiably explore cause-effect relationships in their environment. RL systems based on this novel CausalXRL architecture will provide cause-effect and counterfactual justifications for their suggested actions, allowing them to fulfill the right to an explanation in human-centric environments.

We will implement the CausalXRL architecture as a bio-plausible (neuromorphic) algorithm to enable its deployment in resource-limited, e.g., mobile, environments. We will demonstrate the broad applicability and impact of CausalXRL on several use cases, ranging from neuro-rehabilitation to intensive care, farming and education.
 
Description CausalXRL aims to interpret how specific machine learning algorithms make decisions, allowing humans to understand the reasons behind the selected outcome. The decision making algorithm first learns a model of the environment, which enables it to simulate possible decision scenarios in advance and explain why a specific one is chosen. The project has generated the following outcomes:

- Development of a Methodology for Learning a Transition Model of the Environment: The team succeeded in creating a method to understand and predict environmental changes. This predictive model is structured as a causal-directed acyclic graph, which, in simpler terms, allows us to map out and explain the sequence of events or actions leading from one state to another in a clear, logical order. Such a structure is vital for providing causal explanations in reinforcement learning, a type of artificial intelligence where machines learn to make decisions by trying different strategies to achieve a goal.

- Dimensionality Reduction: In partnership with the University of Vienna, the project advanced in reducing the complexity of data from the system under study, with direct application to neural recordings. High-dimensional dynamics involving vast amounts of data points and interactions were transformed into a simpler, lower-dimensional representation without losing essential information. This process ensures that the underlying cause-and-effect relationships remain intact and understandable, which is crucial for making the machine's decision-making process transparent and justifiable.

- Method to Measure Exploration in Reinforcement Learning: In collaboration with Inria, Lille, we developed a new approach to quantify how much exploration an artificial intelligence system needs when learning through reinforcement learning compared to more direct learning methods. This work measures how often the system explores new strategies to improve its decision-making capabilities.

- Modelling Continuous State-Action-Time Systems: We worked on adapting reinforcement learning for systems where changes occur continuously over time in a continuous state-action space, using advanced neural networks capable of solving differential equations. This work is pivotal for applications requiring real-time decision-making. The project also embarked on developing learning algorithms that mimic biological processes, aiming for more natural and efficient machine-learning methods.
Exploitation Route The aim of making machine learning algorithms explainable is a challenging endeavour and requires extensive research. The findings and advancements from this project set a foundational framework and pave the way for applications in causal, explainable reinforcement learning. The potential avenues for taking the findings forward include:

Future Research and Grants: The strategies, techniques, and algorithms we've developed serve as a robust foundation for future scientific inquiry and innovation in this area. This includes the continuation of development through further funding.

Resources for the Global Research Community: By making our research findings and the software we have developed publicly available, we provide tools that other researchers can utilise in applications, particularly in the area of neuroscience, and build upon to tackle this challenging problem.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description EPSRC Doctoral Training Partnership (DTP) - Early Career Researcher Scholarship Award
Amount £160,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2022 
End 07/2025
 
Title BunDLe-Net 
Description Behavioural and Dynamic Learning Network (BunDLe Net) is an algorithm to learn meaningful coarse-grained representations from time-series data. It maps high-dimensional data to low-dimensional space while preserving both dynamical and behavioural information. It has been applied, but is not limited to neuronal manifold learning. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact We expect this to develop as a key tool for dimensionality reduction of neural data. 
URL https://github.com/akshey-kumar/BunDLe-Net
 
Description Dr Debabrota Basu 
Organisation Inria research centre Lille - Nord Europe
Country France 
Sector Public 
PI Contribution Sheffield, Gilra, Vasilaki, and Manneschi collaboratively conducted staff training sessions to ensure a high level of competency across the team. Gilra supervised the entire project but also delineated the research, developed a robust methodology, and played a pivotal role in the manuscript's composition.
Collaborator Contribution Basu participated in research discussions, refined the methodology, and contributed to editing of the paper.
Impact Nkhumise RM, Basu D, Prescott TJ, Gilra A. Measuring Exploration in Reinforcement Learning via Optimal Transport in Policy Space. arXiv; 2024. doi:10.48550/arXiv.2402.09113
Start Year 2021
 
Description Professor Moritz Grosse-Wentrup 
Organisation University of Vienna
Country Austria 
Sector Academic/University 
PI Contribution Gilra offered expertise in neural network design and neural data analysis. He contributed to research discussions, refined methodology and co-authored a research publication.
Collaborator Contribution Grosse-Wentrup contributed his expertise in causal modelling, facilitated the training of staff, and provided essential facilities, including lab space. Furthermore, he led the development of the methodology and guided the authoring of the manuscript.
Impact Kumar A, Gilra A, Gonzalez-Soto M, Meunier A, Grosse-Wentrup M. BunDLe-Net: Neuronal Manifold Learning Meets Behaviour. bioRxiv; 2023. p. 2023.08.08.551978. doi:10.1101/2023.08.08.551978
Start Year 2022
 
Description Annual talk presentation on our project CausalXRL to Chist-era projects seminar meetings for running projects funded by Chist-era 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Our grant CausalXRL is funded under the Chist-era call for eXplainable AI (XAI) 2020. Chist-era organizes an yearly meeting of all ongoing funded projects. Acting coordinator Gilra represented our CausalXRL project and gave 10-min talks at each of these annual meetings. Those in 2021 and 2022 were online, while the one in 2023 was in Bratislava, Slovakia. The upcoming 2024 one is in Finland and Gilra will present there as well.
Year(s) Of Engagement Activity 2021,2022,2023,2024
URL https://www.chistera.eu/projects-seminar-2023-programme
 
Description Poster presentation at Cognitive Computational Neuroscience international conference at Oxford UK 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The co-authors of https://www.biorxiv.org/content/10.1101/2023.08.08.551978v3 including Gilra presented a poster on their work at the Cognitive Computational Neuroscience (CCN) yearly international conference in 2023 held at Oxford, UK. Apart from visibility to various researchers in the field, there were active discussions on our work with some researchers, both at the poster presentation and elsewhere at the conference.
Year(s) Of Engagement Activity 2023
URL https://2023.ccneuro.website/view_paper6f34.html?PaperNum=1089