TBCMulti-Agent Reinforcement Learning for Assistive Robots

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

This project uses reinforcement learning for robotic control to aid disabled humans in various everyday assistive tasks. These assistive tasks include washing, dressing, eating and drinking. The project uses the assistive gym environment, which simulates these tasks to be as close as possible to physically realistic settings (Erickson et al., 2019). The project is particularly challenging as it has a large action space and long sequences of actions. Additionally, the project emphasises human-robot interaction, where the robot needs to learn policies that can satisfy the preferences of humans and anticipate the cooperative behaviour of humans.



Currently, the project is replicating and expanding on baseline algorithms used in existing research. In particular, the project will explore the application of decision transformers to the specific environment, an architectural implementation that still needs to be implemented. This is guided by the success of short-term memory architectures that are better at learning sequential dependencies in the environment (Glaese et al., 2022). Transformers are a promising alternative to LSTM, being more efficient at learning long-term dependencies, allowing the robot to choose better action early on in the sequence to improve the entire trajectory.



Future work will focus on challenging restrictive assumptions prevalent in past research. In particular, the project will address two assumptions: that humans will optimally cooperate with the robot and that human preferences are given ex-ante and do not dynamically change. To remove these assumptions, the project will implement reinforcement learning from human feedback, accommodating a more complex and diverse set of preferences that need not be predefined. Techniques like inverse reinforcement learning from human data can simulate sub-optimal human cooperation. These enhancements aim to develop algorithms that learn more adept policies for real-world applications.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/Y528705/1 30/09/2023 29/09/2028
2901369 Studentship EP/Y528705/1 01/11/2023 29/06/2027 Leonard Hinckeldey