TBCMulti-Agent Reinforcement Learning for Assistive Robots

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Informatics

Abstract

This project uses reinforcement learning for robotic control to aid disabled humans in various everyday assistive tasks. These assistive tasks include washing, dressing, eating and drinking. The project uses the assistive gym environment, which simulates these tasks to be as close as possible to physically realistic settings (Erickson et al., 2019). The project is particularly challenging as it has a large action space and long sequences of actions. Additionally, the project emphasises human-robot interaction, where the robot needs to learn policies that can satisfy the preferences of humans and anticipate the cooperative behaviour of humans.

Currently, the project is replicating and expanding on baseline algorithms used in existing research. In particular, the project will explore the application of decision transformers to the specific environment, an architectural implementation that still needs to be implemented. This is guided by the success of short-term memory architectures that are better at learning sequential dependencies in the environment (Glaese et al., 2022). Transformers are a promising alternative to LSTM, being more efficient at learning long-term dependencies, allowing the robot to choose better action early on in the sequence to improve the entire trajectory.

Future work will focus on challenging restrictive assumptions prevalent in past research. In particular, the project will address two assumptions: that humans will optimally cooperate with the robot and that human preferences are given ex-ante and do not dynamically change. To remove these assumptions, the project will implement reinforcement learning from human feedback, accommodating a more complex and diverse set of preferences that need not be predefined. Techniques like inverse reinforcement learning from human data can simulate sub-optimal human cooperation. These enhancements aim to develop algorithms that learn more adept policies for real-world applications.

Student:

Leonard Hinckeldey

Period of Study:

Nov 23 - Jun 27

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2901369

Research Topic:

Unclassified

Organisations

University of Edinburgh (Lead Research Organisation)

People	ORCID iD
Stefano Albrecht (Primary Supervisor)	http://orcid.org/0000-0002-8735-1465
Leonard Hinckeldey (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/W524384/1			30/09/2022	29/09/2028
2901369	Studentship	EP/W524384/1	01/11/2023	29/06/2027	Leonard Hinckeldey

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects