Data-efficient Reinforcement Learning

Lead Research Organisation: University of Cambridge
Department Name: Engineering

Abstract

Reinforcement Learning (RL) algorithms are an alternative to traditional model-based control that learn from data the optimal actions to take. Unlike the latter, RL methods do not need an in-built model of their dynamical system, enabling them to successfully make decisions when the true model is complicated or not perfectly known during design. Unfortunately, their application to many settings, such as autonomous robotics and smart buildings, is hampered by their need for large amounts of data. This project focuses on improving the data-efficiency of RL systems, using Bayesian inference and reasoning techniques similar to those from chess-playing AI. We will study systems that take into account the long-term value of a certain decision, both in terms of the benefits it achieves and the information it provides for future decisions. Solving these challenges will enable application of RL in domains such as personalised education, digital health, robotics, and the smart grid.

Publications

10 25 50

publication icon
Burt David R. (2020) Understanding Variational Inference in Function-Space in arXiv e-prints

publication icon
Fortuin V. (2022) BAYESIAN NEURAL NETWORK PRIORS REVISITED in ICLR 2022 - 10th International Conference on Learning Representations

publication icon
Fortuin Vincent (2021) Bayesian Neural Network Priors Revisited in arXiv e-prints

publication icon
Garriga-Alonso (2021) Exact Langevin Dynamics with Stochastic Gradients in arXiv e-prints

publication icon
Garriga-Alonso A. (2021) Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks in 37th Conference on Uncertainty in Artificial Intelligence, UAI 2021

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509620/1 30/09/2016 29/09/2022
1950008 Studentship EP/N509620/1 30/09/2017 29/09/2020 Adria Garriga Alonso
 
Description INTRODUCTION

Knowing when your model is wrong is very useful in machine learning applications that have immediate consequences for people. Examples abound: detecting tumours in CT scans, controlling a power plant turbine or a self-driving car, deciding whether to grant a loan application... Typically, this is done by estimating the amount of uncertainty in a particular prediction, and known as "uncertainty quantification".

RESEARCH QUESTIONS

One promising and popular way for giving correct uncertainty quantifications to models that perform well is Bayesian deep learning. It is promising because it starts with a model that performs well (a deep neural network) and then attempts to consider many possible settings of its weights, and whether they may be mistaken (the Bayesian part). For the Bayesian school of statistical thought, this is a very satisfying resolution, but a number of open questions remain:
- How do we choose the "prior distribution" for the neural network, that is, what we know before taking into account the data?
- How do we calculate the resulting predictions? In theory this is easy, but in practice we must resort to approximating the results, and it is unclear what the best approximation is.

FINDINGS

The findings here provide partial answers for all of these in the context of convolutional networks (CNNs), which make predictions given images as an input, and are one of the most successful kind of neural network.
- How to choose a prior? We might put a standard Gaussian distribution in each weight. We prove that, if the network is too wide, this leads to the many layers effectively collapsing into a single one. Perhaps we should use another kind of prior. We provide empirical evidence that other simple priors (the Student-t and correlated Gaussian) work better than the standard Gaussian in practice.
- How to calculate the resulting predictions? We provide a scheme based on simulation of a high-dimensional physical system (Langevin dynamics) while only processing small batches of data at a time, which works well in practice.
Exploitation Route The statistical techniques could be used to learn models to do predictions with a known degree of uncertainty, in medical or industrial settings. The resulting statistical inference techniques from Langevin dynamics can also be used for other kinds of models.
Sectors Aerospace

Defence and Marine

Electronics

Energy

URL https://agarri.ga/#publications_selected
 
Description Bayesian neural network priors - Bristol 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Research ideas, writing code, conducting and interpreting experiments, and paper writing.
Collaborator Contribution Dr. Laurence Aitchison provided research ideas, writing for the paper, and interpreting experimental results.
Impact The paper "Bayesian Neural Network Priors Revisited"
Start Year 2020
 
Description Bayesian neural network priors - ETHZ 
Organisation ETH Zurich
Department Department of Computer Science
Country Switzerland 
Sector Academic/University 
PI Contribution I contributed research ideas, wrote a good part of the research code, conducted some experiments, interpreted results, and wrote part of the final paper.
Collaborator Contribution My collaborators Vincent Fortuin and Gunnar Rätsch did much of the same: contribute research ideas, conducted and interpreted experiments, and wrote part of the paper. They also contributed hours in a computing cluster.
Impact The papers "Exact Langevin dynamics with stochastic gradients" and "Bayesian Neural Network Priors Revisited"
Start Year 2020
 
Description Bayesian neural network priors - Imperial College 
Organisation Imperial College London
Department Department of Computing
Country United Kingdom 
Sector Academic/University 
PI Contribution I provided research ideas, experimental code and writing. My research group, the Machine Learning Group at Cambridge, provided computing resources.
Collaborator Contribution Dr. Mark van der Wilk collaborated with research ideas, interpreting experiments, and paper writing.
Impact The papers "Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks" and "Bayesian Neural Network Priors Revisited"
Start Year 2019