Open-Ended Discovery of Skill Hierarchies in Artificial Intelligence

Lead Research Organisation: University of Bath

Department Name: Computer Science

Abstract

People solve complex tasks every day by decomposing them into smaller sub-tasks. For instance, the task of making a cup of tea can be decomposed into the sub-tasks of boiling the kettle, adding sugar, adding a tea-bag, grasping the cup, and so on. These sub-problems can themselves be decomposed into even smaller sub-problems, all the way down to the individual muscle movements involved - forming a hierarchy of skills useful for solving the problem.

Of course, learning how to make a cup of tea at the scale of muscle movements would be an unreasonably large computational undertaking - much of our problem-solving ability is attributed to our ability to discover and plan using hierarchically organised higher-level behaviours. Planning and learning a sequence of a few high-level behaviours is clearly less computationally expensive than planning and learning a sequence of perhaps millions of primitive actions.

Two key open research questions are how such useful skills should be characterised, and how an artificially intelligent agent should go about autonomously discovering them. It is these questions that we hope, at least in part, to address during this research project.

We frame this research within the well-developed framework of Reinforcement Learning (RL), which concerns itself broadly with how artificially intelligent agents should learn optimal behavioural policies through interaction with their environments. Many RL methods, even those considered state-of-the-art, operate using primitive actions - they are still making cups of tea operating on the scale of muscle movements, as it were. The branch of RL which considers higher-level behaviours taken over varying timesteps is known as Hierarchical Reinforcement Learning (HRL), in reference to how skills can be organised hierarchically.

Explicitly, the main objective of this research project is to develop an HRL algorithm, or set of HRL algorithms, which endow artificially intelligent agents with the ability to discover a hierarchy of useful high-level behaviours through interaction with their environment.

There are several desirable properties that the algorithm(s) developed over the course of this project should possess. Firstly, the algorithms should be developmental, with higher-level, more complex skills being constructed hierarchically from lower-level ones as time goes on. Secondly, the algorithms should be domain-independent to ensure their applicability to many types of problem. Thirdly, it would be a desirable outcome if the algorithms developed performed well in tasks which are currently considered difficult (e.g. "hard exploration" problems such as the game of Montezuma's Revenge).

These desirable properties stem partly from various shortcomings in current HRL methods. For instance, many existing HRL methods are applicable only in discrete domains, those with small state-spaces, or otherwise would not scale well to larger domains or those with continuous state-spaces. This limits their applicability to many of the interesting, complex problems that are ultimately of interest to RL.

The benefits of developing such algorithms would be wide-ranging - allowing reinforcement learning to be applied to larger, more complex problems to which current method simply do not scale well.

Student:

Joshua EVANS

Period of Study:

Sep 19 - Mar 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2278914

Research Topic:

Unclassified

Organisations

University of Bath (Lead Research Organisation)

People	ORCID iD
Joshua EVANS (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513155/1			30/09/2018	29/09/2023
2278914	Studentship	EP/R513155/1	30/09/2019	30/03/2023	Joshua EVANS

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects