TBCMulti-Agent Reinforcement Learning for Assistive Robots
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Informatics
Abstract
This project uses reinforcement learning for robotic control to aid disabled humans in various everyday assistive tasks. These assistive tasks include washing, dressing, eating and drinking. The project uses the assistive gym environment, which simulates these tasks to be as close as possible to physically realistic settings (Erickson et al., 2019). The project is particularly challenging as it has a large action space and long sequences of actions. Additionally, the project emphasises human-robot interaction, where the robot needs to learn policies that can satisfy the preferences of humans and anticipate the cooperative behaviour of humans.
Currently, the project is replicating and expanding on baseline algorithms used in existing research. In particular, the project will explore the application of decision transformers to the specific environment, an architectural implementation that still needs to be implemented. This is guided by the success of short-term memory architectures that are better at learning sequential dependencies in the environment (Glaese et al., 2022). Transformers are a promising alternative to LSTM, being more efficient at learning long-term dependencies, allowing the robot to choose better action early on in the sequence to improve the entire trajectory.
Future work will focus on challenging restrictive assumptions prevalent in past research. In particular, the project will address two assumptions: that humans will optimally cooperate with the robot and that human preferences are given ex-ante and do not dynamically change. To remove these assumptions, the project will implement reinforcement learning from human feedback, accommodating a more complex and diverse set of preferences that need not be predefined. Techniques like inverse reinforcement learning from human data can simulate sub-optimal human cooperation. These enhancements aim to develop algorithms that learn more adept policies for real-world applications.
Currently, the project is replicating and expanding on baseline algorithms used in existing research. In particular, the project will explore the application of decision transformers to the specific environment, an architectural implementation that still needs to be implemented. This is guided by the success of short-term memory architectures that are better at learning sequential dependencies in the environment (Glaese et al., 2022). Transformers are a promising alternative to LSTM, being more efficient at learning long-term dependencies, allowing the robot to choose better action early on in the sequence to improve the entire trajectory.
Future work will focus on challenging restrictive assumptions prevalent in past research. In particular, the project will address two assumptions: that humans will optimally cooperate with the robot and that human preferences are given ex-ante and do not dynamically change. To remove these assumptions, the project will implement reinforcement learning from human feedback, accommodating a more complex and diverse set of preferences that need not be predefined. Techniques like inverse reinforcement learning from human data can simulate sub-optimal human cooperation. These enhancements aim to develop algorithms that learn more adept policies for real-world applications.
Organisations
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/W524384/1 | 30/09/2022 | 29/09/2028 | |||
2901369 | Studentship | EP/W524384/1 | 01/11/2023 | 29/06/2027 | Leonard Hinckeldey |