Scalable Multi-Agent Reinforcement Learning for Motion Control

Lead Research Organisation: University of Oxford

Abstract

Motion control modelling is essential for developing predictive simulation models in healthcare applications, to improve patient outcomes in areas such as exoskeleton design, surgery recovery, and functional electrical stimulation. Reinforcement Learning (RL), a sub-field of machine learning, has shown potential in addressing the challenges of developing controllers that handle both basic movements like walking and higher-level controls such as motor skill coordination in complex and dynamic environments. However, single-agent RL controllers have limitations, including oversimplifying interactions and inadequately capturing the complexity of real-world environments. This has led to a growing interest in Multi-Agent Reinforcement Learning (MARL).

Recent studies on MARL have demonstrated its effectiveness in high-dimensional state-action domains like gaming, autonomous driving and robotics. In MARL, multiple agents collaborate or compete to solve a common task while learning an optimal policy. This approach is especially suited for extended sequential decision-making processes with delayed feedback, critical in real-world scenarios. Moreover, MARL enables agents to share knowledge, imitate or learn directly from one another, potentially accelerating the learning process and uncovering more efficient methods for achieving goals.

Despite MARL's advantages, scalability remains a significant challenge. As the number of agents increases, the joint action space dimension grows exponentially, leading to a combinatorial explosion. This rapid growth in complexity makes scaling MARL algorithms difficult, as they require more computational resources and longer training times. Additionally, multiple agents learning simultaneously introduces non-stationarity into the learning environment, complicating convergence to a stable solution. These challenges are exacerbated by the issues of credit assignment - determining each agent's contribution to overall performance - and partial observability of the environment, which hinder effective learning as the number of agents increases.

Our research aims to address these questions: how can MARL algorithms be adapted to manage the increasing complexity of joint action spaces as the number of agents grows? What strategies can be used to tackle the issues of credit assignment and partial observability in multi-agent environments? To answer these questions, we aim to develop a scalable MARL framework that handles high-dimensional state-action spaces in complex environments. The project will explore novel MARL algorithms or modifications to existing ones and employ specific simulation environments and software tools for modelling and analysing multi-agent control systems.

The potential applications of our research could contribute to a broader understanding of human motion control and its various applications in healthcare domains. This project aligns with the EPSRC Artificial Intelligence and Robotics, Healthcare Technologies, and Information and Communication Technologies research areas.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Student:

Nasma Dasser

Period of Study:

Oct 21 - Dec 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2593947

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Shimon Whiteson (Primary Supervisor)
Nasma Dasser (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S02428X/1			01/04/2019	30/09/2027
2593947	Studentship	EP/S02428X/1	01/10/2021	31/12/2025	Nasma Dasser