Exploring Causality in Reinforcement Learning for Robust Decision Making
Lead Research Organisation:
King's College London
Department Name: Informatics
Abstract
Reinforcement learning (RL) has seen significant development in recent years and has demonstrated impressive capabilities in decision-making tasks, such as games (AlphaStar, OpenAI Five), chatbots (ChatGPT, and recommendation systems (Microsoft). The techniques of RL can also be applied to many fields, such as transportation, network communications, autonomous driving, sequential treatment in healthcare, robotics, and control. Unlike traditional supervised learning, RL focuses on making a sequence of decisions to achieve a long-term goal. This makes it particularly well-suited for solving complex problems. However, while RL has the potential to be highly effective, there are challenges that need to be addressed in order to make it more practical for real-world applications, where changing factors cannot be fully considered in training the agent, such as traffic regulations, weather, and clouds. To empower RL algorithms to be deployed in a range of real applications, we need to evaluate and improve the robustness of RL when facing complex changes in the real world and task shifts.
In this project, we aim to develop robust and generalisable reinforcement learning techniques from a causal modelling perspective. The first thrust focuses on utilising causal model learning to create compact and robust representations of tasks. This compact and robust task representation can greatly benefit the overall performance of the RL agent by reducing the complexity of the problem and making the agent's decision-making process more efficient. As a result, the agent can learn faster and generalise better to unseen tasks, which is especially important in real-world scenarios where data is scarce and the complexity of tasks can vary greatly.
The second research thrust focuses on the development of efficient and generalisable algorithms for task assignment transfer. This can enable the RL agent to adapt to new tasks more quickly and effectively and to generalise the learned knowledge to different but related tasks. This is crucial for real-world scenarios where the agent needs to operate in different environments or the task requirements change over time.
One example of an application that would benefit from these contributions is autonomous driving in an industrial setting. While RL agents are usually trained in simulators, they may not perform well in real-world road scenarios and can be easily distracted by task-irrelevant information. For example, visual images that autonomous cars observe contain predominantly task-irrelevant information, like cloud shapes and architectural details, which should not influence the decision on driving.
In this project, we aim to enable the agent to learn a compact and robust representation of the task, enabling it to only retain state information that is relevant to the task, adapt to changing driving scenarios safely, and generalise its knowledge to related tasks such as adapting to the different driving rules in the United States (right-hand drive).
A causal understanding can help identify the minimal sufficient representations that are essential for policy learning and transferring and achieve safe and controllable explorations by leveraging causal structures and counterfactual reasoning.
It can mitigate the issues that are suffered by most existing RL approaches, such as being data-hungry and lacking interpretability and generalisability.
The outcome of this project can greatly improve the scalability and adaptability of RL agents, making them more suitable for real-world applications.
In this project, we aim to develop robust and generalisable reinforcement learning techniques from a causal modelling perspective. The first thrust focuses on utilising causal model learning to create compact and robust representations of tasks. This compact and robust task representation can greatly benefit the overall performance of the RL agent by reducing the complexity of the problem and making the agent's decision-making process more efficient. As a result, the agent can learn faster and generalise better to unseen tasks, which is especially important in real-world scenarios where data is scarce and the complexity of tasks can vary greatly.
The second research thrust focuses on the development of efficient and generalisable algorithms for task assignment transfer. This can enable the RL agent to adapt to new tasks more quickly and effectively and to generalise the learned knowledge to different but related tasks. This is crucial for real-world scenarios where the agent needs to operate in different environments or the task requirements change over time.
One example of an application that would benefit from these contributions is autonomous driving in an industrial setting. While RL agents are usually trained in simulators, they may not perform well in real-world road scenarios and can be easily distracted by task-irrelevant information. For example, visual images that autonomous cars observe contain predominantly task-irrelevant information, like cloud shapes and architectural details, which should not influence the decision on driving.
In this project, we aim to enable the agent to learn a compact and robust representation of the task, enabling it to only retain state information that is relevant to the task, adapt to changing driving scenarios safely, and generalise its knowledge to related tasks such as adapting to the different driving rules in the United States (right-hand drive).
A causal understanding can help identify the minimal sufficient representations that are essential for policy learning and transferring and achieve safe and controllable explorations by leveraging causal structures and counterfactual reasoning.
It can mitigate the issues that are suffered by most existing RL approaches, such as being data-hungry and lacking interpretability and generalisability.
The outcome of this project can greatly improve the scalability and adaptability of RL agents, making them more suitable for real-world applications.
People |
ORCID iD |
| Yali Du (Principal Investigator) |
http://orcid.org/0000-0001-5683-2621
|
Publications
| Description | Through this award, our research team developed new methods to help multiple "agents"-such as software programs or robots-work together more effectively and transparently on shared tasks. We focused on making it easier to pinpoint each agent's individual contribution to the overall result, even when rewards or outcomes appear only at the end of a task (often called "delayed rewards"). These methods use principles from "causal inference," which let us identify the cause-and-effect relationships behind each agent's actions, and thereby assign credit (or blame) in a fair and interpretable way. Our work not only improves performance in these cooperative settings but also makes decision-making processes more understandable-an important step toward building trustworthy and transparent AI systems. |
| Exploitation Route | Researchers and engineers can adapt our credit-assignment techniques to any system in which multiple AI agents collaborate, from coordinating robots in a factory to optimizing traffic lights in a city. Policymakers, industry leaders, and regulators seeking greater transparency in AI can use our methods to explain why and how certain actions were taken, helping build trust in automated decision-making systems. |
| Sectors | Digital/Communication/Information Technologies (including Software) Energy Manufacturing including Industrial Biotechology Transport |
| Description | International collaboration with Biwei Huang from University of California San Diego |
| Organisation | University of California, San Diego (UCSD) |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | My research team and I primarily contributed our expertise in Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL) to this collaboration. We focused on designing and implementing novel RL and MARL frameworks, developing algorithms that could effectively integrate causal insights, and conducting extensive experiments to validate the methods. We have developed algorithms for interpretable credit assignment for sparse reward challenge on RL, and agent-level credit assignment for multi-agent RL. We also took the lead on setting up decision making environments, optimizing the computational workflows for simulations, and writing key sections of the resulting papers related to RL models, experimental setup, and performance evaluations. In doing so, we ensured that the synergy between RL, MARL, and causal machine learning concepts was both practically and theoretically sound, thus strengthening the rigor and impact of the research findings. |
| Collaborator Contribution | Our partner, Dr. Biwei Huang from UCSD, offered specialized expertise in causal machine learning and was instrumental in formulating the underlying causal frameworks and theoretical justifications for the proposed methods. Dr. Huang's team provided critical guidance on causal structure learning, identifying underlying dependencies and confounders in complex multi-agent settings. They also contributed to developing innovative causal inference techniques that complemented our RL and MARL algorithms. By integrating causal insights into the learning process, Dr. Huang's team enhanced the interpretability and robustness of the results. Furthermore, they assisted in drafting the theoretical sections of our publications and supported the broader research vision by highlighting relevant causal theory, ensuring that our joint work maintained a strong scientific foundation. |
| Impact | Zhang, Yudi, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang, Meng Fang, and Mykola Pechenizkiy. "Interpretable reward redistribution in reinforcement learning: A causal approach." Advances in Neural Information Processing Systems 36 (2023): 20208-20229. Wang, Ziyan, Yali Du, Yudi Zhang, Meng Fang, and Biwei Huang. "MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment." In NeurIPS 2024 Causal Representation Learning Workshop. Yudi Zhang, Yali Du, Biwei Huang, Meng Fang and Mykola Pechenizkiy. "A Causality-Inspired Spatial-Temporal Return Decomposition Approach for Multi-Agent Reinforcement Learning." In NeurIPS 2024 Causal Representation Learning Workshop. |
| Start Year | 2024 |
