Data-efficient Reinforcement Learning

Lead Research Organisation: University of Cambridge

Department Name: Engineering

Abstract

Reinforcement Learning (RL) algorithms are an alternative to traditional model-based control that learn from data the optimal actions to take. Unlike the latter, RL methods do not need an in-built model of their dynamical system, enabling them to successfully make decisions when the true model is complicated or not perfectly known during design. Unfortunately, their application to many settings, such as autonomous robotics and smart buildings, is hampered by their need for large amounts of data. This project focuses on improving the data-efficiency of RL systems, using Bayesian inference and reasoning techniques similar to those from chess-playing AI. We will study systems that take into account the long-term value of a certain decision, both in terms of the benefits it achieves and the information it provides for future decisions. Solving these challenges will enable application of RL in domains such as personalised education, digital health, robotics, and the smart grid.

Student:

Adria Garriga Alonso

Period of Study:

Oct 17 - Sep 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1950008

Research Topic:

Unclassified

Organisations

People	ORCID iD
Carl Rasmussen (Primary Supervisor)
Adria Garriga Alonso (Student)

Publications

Author Name Title Publication Date Published

10 25 50

A. Garriga-Alonso (2019) Deep convolutional networks as shallow Gaussian processes

Burt David R. (2020) Understanding Variational Inference in Function-Space in arXiv e-prints

Fortuin V (2021) BNNpriors: A library for Bayesian neural network inference with different prior distributions in Software Impacts

Fortuin Vincent (2021) Bayesian Neural Network Priors Revisited in arXiv e-prints

Garriga-Alonso (2021) Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks in arXiv e-prints

Garriga-Alonso (2021) Exact Langevin Dynamics with Stochastic Gradients in arXiv e-prints

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509620/1			01/10/2016	30/09/2022
1950008	Studentship	EP/N509620/1	01/10/2017	30/09/2020	Adria Garriga Alonso

Key Findings
Collaboration


Description	INTRODUCTION Knowing when your model is wrong is very useful in machine learning applications that have immediate consequences for people. Examples abound: detecting tumours in CT scans, controlling a power plant turbine or a self-driving car, deciding whether to grant a loan application... Typically, this is done by estimating the amount of uncertainty in a particular prediction, and known as "uncertainty quantification". RESEARCH QUESTIONS One promising and popular way for giving correct uncertainty quantifications to models that perform well is Bayesian deep learning. It is promising because it starts with a model that performs well (a deep neural network) and then attempts to consider many possible settings of its weights, and whether they may be mistaken (the Bayesian part). For the Bayesian school of statistical thought, this is a very satisfying resolution, but a number of open questions remain: - How do we choose the "prior distribution" for the neural network, that is, what we know before taking into account the data? - How do we calculate the resulting predictions? In theory this is easy, but in practice we must resort to approximating the results, and it is unclear what the best approximation is. FINDINGS The findings here provide partial answers for all of these in the context of convolutional networks (CNNs), which make predictions given images as an input, and are one of the most successful kind of neural network. - How to choose a prior? We might put a standard Gaussian distribution in each weight. We prove that, if the network is too wide, this leads to the many layers effectively collapsing into a single one. Perhaps we should use another kind of prior. We provide empirical evidence that other simple priors (the Student-t and correlated Gaussian) work better than the standard Gaussian in practice. - How to calculate the resulting predictions? We provide a scheme based on simulation of a high-dimensional physical system (Langevin dynamics) while only processing small batches of data at a time, which works well in practice.
Exploitation Route	The statistical techniques could be used to learn models to do predictions with a known degree of uncertainty, in medical or industrial settings. The resulting statistical inference techniques from Langevin dynamics can also be used for other kinds of models.
Sectors	Aerospace, Defence and Marine,Electronics,Energy
URL	https://agarri.ga/#publications_selected


Description	Bayesian neural network priors - Bristol
Organisation	University of Bristol
Country	United Kingdom
Sector	Academic/University
PI Contribution	Research ideas, writing code, conducting and interpreting experiments, and paper writing.
Collaborator Contribution	Dr. Laurence Aitchison provided research ideas, writing for the paper, and interpreting experimental results.
Impact	The paper "Bayesian Neural Network Priors Revisited"
Start Year	2020


Description	Bayesian neural network priors - ETHZ
Organisation	ETH Zurich
Department	Department of Computer Science
Country	Switzerland
Sector	Academic/University
PI Contribution	I contributed research ideas, wrote a good part of the research code, conducted some experiments, interpreted results, and wrote part of the final paper.
Collaborator Contribution	My collaborators Vincent Fortuin and Gunnar Rätsch did much of the same: contribute research ideas, conducted and interpreted experiments, and wrote part of the paper. They also contributed hours in a computing cluster.
Impact	The papers "Exact Langevin dynamics with stochastic gradients" and "Bayesian Neural Network Priors Revisited"
Start Year	2020


Description	Bayesian neural network priors - Imperial College
Organisation	Imperial College London
Department	Department of Computing
Country	United Kingdom
Sector	Academic/University
PI Contribution	I provided research ideas, experimental code and writing. My research group, the Machine Learning Group at Cambridge, provided computing resources.
Collaborator Contribution	Dr. Mark van der Wilk collaborated with research ideas, interpreting experiments, and paper writing.
Impact	The papers "Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks" and "Bayesian Neural Network Priors Revisited"
Start Year	2019

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects