Abstraction and Generalisation in Reinforcement Learning
Lead Research Organisation:
University College London
Department Name: Computer Science
Abstract
I am studying how tasks can be represented using abstract state spaces, and how these state spaces can be inferred from experience. If you were to describe to someone how to order a coffee at a cafe, you might say something like "walk through the door, go to the counter, look at the menu and decide what you want, tell the barista your order and then wait for your coffee to be served". In a reinforcement learning setting, that set of instructions defines a set of transitions between states - that is, the state of being outside the cafe, to the state of being inside the cafe, to the state of being at the counter, and so on. This state space however is not at the same granularity as the actions you take in the world - when you walk though the door, you are not considering every single action you take at the level of muscle twitches, or even at the level of swinging your legs. Reinforcement learning agents are constrained to the most granular action defined in their environment, which by analogy would be muscle twitches in this instance. State abstraction is the process by which an agent would discover the high-level states described before, by composing its granular actions hierarchically into high-order skills.
I am currently using reinforcement learning methods centred around the successor representation - a way of representation values as the conjunction of state occupancies and reward. This allows us to probe the structure of the state spaces without the confound of reward we would have by looking at the value structure. I am currently building abstract state spaces using symmetric compression, and through temporal abstraction
I am currently using reinforcement learning methods centred around the successor representation - a way of representation values as the conjunction of state occupancies and reward. This allows us to probe the structure of the state spaces without the confound of reward we would have by looking at the value structure. I am currently building abstract state spaces using symmetric compression, and through temporal abstraction
Organisations
People |
ORCID iD |
Peter Bentley (Primary Supervisor) | |
Matthew Sargent (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/R513143/1 | 01/10/2018 | 30/09/2023 | |||
2281998 | Studentship | EP/R513143/1 | 01/10/2019 | 10/07/2024 | Matthew Sargent |