Efficient state representation via task transition Eigenstructure

Lead Research Organisation: University College London
Department Name: Computer Science

Abstract

Aim: How artificial agents represent their state space impacts on their ability to learn and plan optimal behaviour. Stachenfeld et al (2017; also Baram et al., 2018) note that using the eigenvectors of the state transition matrix as a basis set for state estimation allows prediction of state occupancies into the future via a simple linear weighted sum. That is, a single layer network with nodes whose activity across states are the eigenvectors can represent current state occupancy via weighted connections, and these weights need only be modulated by constants to represent state occupancy N steps into the future (the constants being the corresponding eigenvalues^N). Thus using these representations promise fast learning of predictive behaviour and optimal planning, and have interesting parallels with the firing of grid cells in entorhinal cortex. However, all existing results are for symmetrical transition matrices corresponding to a simple diffusive policy. We will investigate the case of learning and planning with directed transition matrices and the resulting Complex eigenstructures, including many questions arising: How these can be used in practise for policy estimation in planning to reach a goal state; How they can be learned in unsupervised fashion from experience of state transitions; How effective they are as a pre-processing stage for deep RL nets; Their correspond to grid cell firing Aim. How artificial agents represent their state space impacts on their ability to learn and plan optimal behaviour. The aim of this project is to develop methods for identifying efficient state-space representations that make planning and prediction easy.
The project, what the student will do. One promising method is to represent the probability distribution over possible states as a weighted sum of basis vectors, where the basis vectors are formed from the Eigenvectors of the task transition matrix (which describes how one state leads to another within the task to be solved). Using this basis set allows prediction of state occupancies into the future via the linear operation of re-weighting the sum of basis vectors, which could be achieved by a single layer neural network. However, while this is straightforward for tasks with symmetric transition matrices (i.e. the probability that state A leads to state B is the same as that B leads to A), the situation becomes more complex for tasks with asymmetric transition matrices - giving Complex values eigenvectors.
The student will investigate learning and planning in tasks with asymmetric transition matrices and Complex Eigen structures, aiming to find methods for choosing efficient state-space representations for planning. Investigations will involve mathematical analysis, including how to form basis sets for tasks containing closed groups of transitions, numerical simulations of unsupervised learning methods, and application to the effects of pre-processing of the input representation on the ability of neural networks to learn task solutions via reinforcement learning. The correspondence of the artificial neural networks to real neural firing in the brain will also be considered for insights into the function of these neurons in guiding behaviour.
Outcomes: This research will be of use in two ways: 1) in machine learning, where finding efficient representations facilitates the learning that is required to predict how a given situation (i.e. probability distribution of occupying possible states) will evolve over time (i.e. predicting future probability distribution over possible states), and thus facilitate planning and decision making; 2) in neuroscience, where we know how some state spaces (e.g. spatial location in a box) are represented by neurons in the brain, and current theories predict that they are used for planning but formal models of how this would work are limited to simple cases.
n which firing rate and firing phase have been shown to encode independent variables.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S021566/1 01/04/2019 30/09/2027
2251578 Studentship EP/S021566/1 23/09/2019 22/09/2023 Changmin Yu