Applications of Reinforcement Learning in Long Term Planning for Robotics
Lead Research Organisation:
University of Oxford
Abstract
Research Context
Whilst Reinforcement Learning (RL) has been shown to be effective when training occurs in
environments identical or very similar to those in which agents are evaluated, the transfer of
agent performance to unseen environments tends to be poor. Methods such as Unsupervised
Environment Design (UED) have attempted to improve the generalisation ability of RL agents but
have only shown somewhat limited success. Similarly, whilst RL has been very successful at
short timescale robot control tasks, longer term control tasks in environments with large state
and action spaces have presented a much greater challenge, especially when trying to apply
model-free approaches. Training agents using model-free policy gradient methods in these
contexts tend to require large amounts of computation, and fail to take advantage of prior
environment knowledge that model-based approaches such as Monte Carlo Tree Search
(MCTS) are able to use. However, online planning methods such as MCTS tend to require much
greater computation at runtime than pre-trained policies.
Aims and Objectives
The aim of this research is to examine whether it is possible to use RL methods to train policies
that are capable of generalising to a wider scope of robotics environments and problem solving
tasks, whilst also using long term planning methods to improve long term decision-making by
selecting optimal pre-trained policies based on a given environment state.
Novelty of the research methodology
Similar research has been done in relation to using policy selection rather than action selection
in long term planning, such as with the use of 'options' in MCTS. However, these options tend to
be task-specific, sometimes hand-designed policies, and so fail to adapt to more diverse sets of
environments. This research aims to train more general policies are more robust to different
environments.
Alignment to EPSRC's strategies and research areas
This research aligns with the EPSRCS's Artificial Intelligence and Robotics research theme. This
research aims to use machine learning techniques to improve task planning in robotics environments.
Whilst Reinforcement Learning (RL) has been shown to be effective when training occurs in
environments identical or very similar to those in which agents are evaluated, the transfer of
agent performance to unseen environments tends to be poor. Methods such as Unsupervised
Environment Design (UED) have attempted to improve the generalisation ability of RL agents but
have only shown somewhat limited success. Similarly, whilst RL has been very successful at
short timescale robot control tasks, longer term control tasks in environments with large state
and action spaces have presented a much greater challenge, especially when trying to apply
model-free approaches. Training agents using model-free policy gradient methods in these
contexts tend to require large amounts of computation, and fail to take advantage of prior
environment knowledge that model-based approaches such as Monte Carlo Tree Search
(MCTS) are able to use. However, online planning methods such as MCTS tend to require much
greater computation at runtime than pre-trained policies.
Aims and Objectives
The aim of this research is to examine whether it is possible to use RL methods to train policies
that are capable of generalising to a wider scope of robotics environments and problem solving
tasks, whilst also using long term planning methods to improve long term decision-making by
selecting optimal pre-trained policies based on a given environment state.
Novelty of the research methodology
Similar research has been done in relation to using policy selection rather than action selection
in long term planning, such as with the use of 'options' in MCTS. However, these options tend to
be task-specific, sometimes hand-designed policies, and so fail to adapt to more diverse sets of
environments. This research aims to train more general policies are more robust to different
environments.
Alignment to EPSRC's strategies and research areas
This research aligns with the EPSRCS's Artificial Intelligence and Robotics research theme. This
research aims to use machine learning techniques to improve task planning in robotics environments.
Organisations
People |
ORCID iD |
| Harry Mead (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S024050/1 | 30/09/2019 | 30/03/2028 | |||
| 2868363 | Studentship | EP/S024050/1 | 30/09/2023 | 29/09/2027 | Harry Mead |