Turing AI Fellowship: Advancing Multi-Agent Deep Reinforcement Learning for Sequential Decision Making in Real-World Applications
Lead Research Organisation:
University of Warwick
Department Name: WMG
Abstract
Despite being far from having reached 'artificial general intelligence' - the broad and deep capability for a machine to comprehend our surroundings - progress has been made in the last few years towards a more specialised AI: the ability to effectively address well-defined, specific goals in a given environment, which is the kind of task-oriented intelligence that is part of many human jobs. Much of this progress has been enabled by deep reinforcement learning (DRL), one of the most promising and fast-growing areas within machine learning.
In DRL, an autonomous decision maker - the "agent" - learns how to make optimal decisions that will eventually lead to reaching a final goal. DRL holds the promise of enabling autonomous systems to learn large repertoires of collaborative and adaptive behavioural skills without human intervention, with application in a range of settings from simple games to industrial process automation to modelling human learning and cognition.
Many real-world applications are characterised by the interplay of multiple decision-makers that operate in the same shared-resources environment and need to accomplish goals cooperatively. For instance, some of the most advanced industrial multi-agent systems in the world today are assembly lines and warehouse management systems. Whether the agents are robots, autonomous vehicles or clinical decision-makers, there is a strong desire for and increasing commercial interest in these systems: they are attractive because they can operate on their own in the world, alongside humans, under realistic constraints (e.g. guided by only partial information and with limited communication bandwidth). This research programme will extend the DRL methodology to systems comprising of many interacting agents that must cooperatively achieve a common goal: multi-agent DRL, or MADRL.
In DRL, an autonomous decision maker - the "agent" - learns how to make optimal decisions that will eventually lead to reaching a final goal. DRL holds the promise of enabling autonomous systems to learn large repertoires of collaborative and adaptive behavioural skills without human intervention, with application in a range of settings from simple games to industrial process automation to modelling human learning and cognition.
Many real-world applications are characterised by the interplay of multiple decision-makers that operate in the same shared-resources environment and need to accomplish goals cooperatively. For instance, some of the most advanced industrial multi-agent systems in the world today are assembly lines and warehouse management systems. Whether the agents are robots, autonomous vehicles or clinical decision-makers, there is a strong desire for and increasing commercial interest in these systems: they are attractive because they can operate on their own in the world, alongside humans, under realistic constraints (e.g. guided by only partial information and with limited communication bandwidth). This research programme will extend the DRL methodology to systems comprising of many interacting agents that must cooperatively achieve a common goal: multi-agent DRL, or MADRL.
Organisations
- University of Warwick (Lead Research Organisation)
- University of Castile-La Mancha (Collaboration)
- Louisiana State University (Collaboration)
- Columbia University (Project Partner)
- UNIVERSITY HOSPITALS BIRMINGHAM NHS FOUNDATION TRUST (Project Partner)
- Imperial College London (Project Partner)
- Hong Kong University of Science and Tech (Project Partner)
- Stanford University (Project Partner)
- University of Cambridge (Project Partner)
- Manchester University NHS Fdn Trust (Project Partner)
- Insignia Medical Systems (Project Partner)
- NVIDIA Limited (UK) (Project Partner)
- Shadow Robot (United Kingdom) (Project Partner)
- King Abdullah University of Sci and Tech (Project Partner)
- Eurocontrol (Project Partner)
- GEFCO UK Ltd (Project Partner)
- Indian Institute of Technology Kharagpur (Project Partner)
- University Hospitals Coventry and Warwickshire NHS Trust (Project Partner)
- Soliton IT Limited (Project Partner)
- The Engineering Laboratory of the United (Project Partner)
- TU Wien (Project Partner)
- Kinova Europe GmbH (Project Partner)
- Hong Kong Polytechnic University (Project Partner)
- Inovo Robotics (Project Partner)
- King's College London (Project Partner)
Publications
Beeson A
(2023)
Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning
in Machine Learning
Gao M
(2023)
Video Object Segmentation using Point-based Memory Network
in Pattern Recognition
Hepburn C
(2023)
Model-based trajectory stitching for improved behavioural cloning and its applications
in Machine Learning
Hepburn C.
(2024)
State-Constrained Offline Reinforcement Learning
| Description | We refined offline reinforcement learning methods for both single and multi-agent problems, which is crucial when collecting new data is expensive, risky, or unethical. We also investigated how agents can learn to collaborate autonomously in multi-agent settings, without predefined protocols. This showed that while decentralized cooperation can boost performance, it can also lead to failures when naïve synergy assumptions are made. Our findings open new research directions for building scalable and more robust offline multi-agent systems, offering valuable insights for future work in safer and more efficient RL applications. |
| Exploitation Route | We see our offline RL advances and decentralized multi-agent strategies being applied and improved by both academic and industry teams. In academia, researchers can extend our methods to more complex tasks, refine theoretical insights, and explore more robust evaluation metrics. In non-academic contexts, such as robotics, finance, or healthcare, our approaches can help tackle real-world problems where collecting fresh data is too expensive, hazardous, or ethically sensitive. By sharing open-source code and collaborating with industry partners, we aim to ensure these techniques are accessible, validated, and continuously refined in practical deployments. We also plan to engage with open-source communities to promote transparent benchmarking and drive further innovation. Ultimately, these combined efforts will foster broader adoption of safer, more efficient RL systems across multiple sectors. |
| Sectors | Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology Transport |
| Description | Collaboration with Universidad de Castilla-La Mancha |
| Organisation | University of Castile-La Mancha |
| Country | Spain |
| Sector | Academic/University |
| PI Contribution | We worked on a joint research project on multi agent reinforcement learning for dynamic pricing in the high-speed passenger railway industry. We drove the methodological developments |
| Collaborator Contribution | They provided datasets and a realistic simulator of high-speed passenger railway networks |
| Impact | One submitted journal paper |
| Start Year | 2024 |
| Description | Shuangqing Wei |
| Organisation | Louisiana State University |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Yue Jin and I conntributed ideas related to a new decentralised multi-agent reinforcement learning algorithm |
| Collaborator Contribution | Yue Jin and I conntributed ideas related to a new decentralised multi-agent reinforcement learning algorithm |
| Impact | An articled titled "Learning to Cooperate under Private Rewards" has been prepared and submitted at the ICML 2024 conference |
| Start Year | 2023 |
| Title | Unity 3D environments |
| Description | We have developed two 3D environments (logistic wearers and production line) in Unity to support research in reinforcement learning. |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Open Source License? | Yes |
| Impact | The software is still in a private GitHub repo and will be released publicly shortly |
