Turing AI Fellowship: Advancing Multi-Agent Deep Reinforcement Learning for Sequential Decision Making in Real-World Applications

Lead Research Organisation: University of Warwick

Department Name: WMG

Abstract

Despite being far from having reached 'artificial general intelligence' - the broad and deep capability for a machine to comprehend our surroundings - progress has been made in the last few years towards a more specialised AI: the ability to effectively address well-defined, specific goals in a given environment, which is the kind of task-oriented intelligence that is part of many human jobs. Much of this progress has been enabled by deep reinforcement learning (DRL), one of the most promising and fast-growing areas within machine learning.
In DRL, an autonomous decision maker - the "agent" - learns how to make optimal decisions that will eventually lead to reaching a final goal. DRL holds the promise of enabling autonomous systems to learn large repertoires of collaborative and adaptive behavioural skills without human intervention, with application in a range of settings from simple games to industrial process automation to modelling human learning and cognition.

Many real-world applications are characterised by the interplay of multiple decision-makers that operate in the same shared-resources environment and need to accomplish goals cooperatively. For instance, some of the most advanced industrial multi-agent systems in the world today are assembly lines and warehouse management systems. Whether the agents are robots, autonomous vehicles or clinical decision-makers, there is a strong desire for and increasing commercial interest in these systems: they are attractive because they can operate on their own in the world, alongside humans, under realistic constraints (e.g. guided by only partial information and with limited communication bandwidth). This research programme will extend the DRL methodology to systems comprising of many interacting agents that must cooperatively achieve a common goal: multi-agent DRL, or MADRL.

Funded Value:

£1,518,508

Funded Period:

Jan 21 - Dec 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Fellowship

Project Reference:

EP/V024868/1

Principal Investigator:

Giovanni Montana

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (100%)

Organisations

People	ORCID iD
Giovanni Montana (Principal Investigator / Fellow)	http://orcid.org/0000-0003-3942-3900

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Beeson A (2023) Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning in Machine Learning

Beeson A. (2022) Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

Charlesworth H. (2021) Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning

Gao M (2023) Video Object Segmentation using Point-based Memory Network in Pattern Recognition

Hepburn C (2023) Model-based trajectory stitching for improved behavioural cloning and its applications in Machine Learning

Hepburn C. (2024) State-Constrained Offline Reinforcement Learning

Hepburn C. (2022) Model-based Trajectory Stitching for Improved Offline Reinforcement Learning

Ireland D. (2022) LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation

Ireland D. (2024) REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes

Jin, Y. (2024) Decentralised multi-agent reinforcement learning via anticipation sharing

Collaboration
Software and Technical Products


Description	Shuangqing Wei
Organisation	Louisiana State University
Country	United States
Sector	Academic/University
PI Contribution	Yue Jin and I conntributed ideas related to a new decentralised multi-agent reinforcement learning algorithm
Collaborator Contribution	Yue Jin and I conntributed ideas related to a new decentralised multi-agent reinforcement learning algorithm
Impact	An articled titled "Learning to Cooperate under Private Rewards" has been prepared and submitted at the ICML 2024 conference
Start Year	2023


Title	Unity 3D environments
Description	We have developed two 3D environments (logistic wearers and production line) in Unity to support research in reinforcement learning.
Type Of Technology	Software
Year Produced	2024
Open Source License?	Yes
Impact	The software is still in a private GitHub repo and will be released publicly shortly

Abstract

Organisations

People

ORCID iD

Publications