# Data-efficient Reinforcement Learning

Lead Research Organisation:
University of Cambridge

Department Name: Engineering

### Abstract

Reinforcement Learning (RL) algorithms are an alternative to traditional model-based control that learn from data the optimal actions to take. Unlike the latter, RL methods do not need an in-built model of their dynamical system, enabling them to successfully make decisions when the true model is complicated or not perfectly known during design. Unfortunately, their application to many settings, such as autonomous robotics and smart buildings, is hampered by their need for large amounts of data. This project focuses on improving the data-efficiency of RL systems, using Bayesian inference and reasoning techniques similar to those from chess-playing AI. We will study systems that take into account the long-term value of a certain decision, both in terms of the benefits it achieves and the information it provides for future decisions. Solving these challenges will enable application of RL in domains such as personalised education, digital health, robotics, and the smart grid.

### Publications

A. Garriga-Alonso
(2019)

*Deep convolutional networks as shallow Gaussian processes*
Burt David R.
(2020)

*Understanding Variational Inference in Function-Space*in arXiv e-prints
Fortuin V
(2021)

*BNNpriors: A library for Bayesian neural network inference with different prior distributions*in Software Impacts
Fortuin Vincent
(2021)

*Bayesian Neural Network Priors Revisited*in arXiv e-prints
Garriga-Alonso
(2021)

*Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks*in arXiv e-prints
Garriga-Alonso
(2021)

*Exact Langevin Dynamics with Stochastic Gradients*in arXiv e-prints### Studentship Projects

Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|

EP/N509620/1 | 30/09/2016 | 29/09/2022 | |||

1950008 | Studentship | EP/N509620/1 | 30/09/2017 | 29/09/2020 | Adria Garriga Alonso |

Description | INTRODUCTION Knowing when your model is wrong is very useful in machine learning applications that have immediate consequences for people. Examples abound: detecting tumours in CT scans, controlling a power plant turbine or a self-driving car, deciding whether to grant a loan application... Typically, this is done by estimating the amount of uncertainty in a particular prediction, and known as "uncertainty quantification". RESEARCH QUESTIONS One promising and popular way for giving correct uncertainty quantifications to models that perform well is Bayesian deep learning. It is promising because it starts with a model that performs well (a deep neural network) and then attempts to consider many possible settings of its weights, and whether they may be mistaken (the Bayesian part). For the Bayesian school of statistical thought, this is a very satisfying resolution, but a number of open questions remain: - How do we choose the "prior distribution" for the neural network, that is, what we know before taking into account the data? - How do we calculate the resulting predictions? In theory this is easy, but in practice we must resort to approximating the results, and it is unclear what the best approximation is. FINDINGS The findings here provide partial answers for all of these in the context of convolutional networks (CNNs), which make predictions given images as an input, and are one of the most successful kind of neural network. - How to choose a prior? We might put a standard Gaussian distribution in each weight. We prove that, if the network is too wide, this leads to the many layers effectively collapsing into a single one. Perhaps we should use another kind of prior. We provide empirical evidence that other simple priors (the Student-t and correlated Gaussian) work better than the standard Gaussian in practice. - How to calculate the resulting predictions? We provide a scheme based on simulation of a high-dimensional physical system (Langevin dynamics) while only processing small batches of data at a time, which works well in practice. |

Exploitation Route | The statistical techniques could be used to learn models to do predictions with a known degree of uncertainty, in medical or industrial settings. The resulting statistical inference techniques from Langevin dynamics can also be used for other kinds of models. |

Sectors | Aerospace, Defence and Marine,Electronics,Energy |

URL | https://agarri.ga/#publications_selected |

Description | Bayesian neural network priors - Bristol |

Organisation | University of Bristol |

Country | United Kingdom |

Sector | Academic/University |

PI Contribution | Research ideas, writing code, conducting and interpreting experiments, and paper writing. |

Collaborator Contribution | Dr. Laurence Aitchison provided research ideas, writing for the paper, and interpreting experimental results. |

Impact | The paper "Bayesian Neural Network Priors Revisited" |

Start Year | 2020 |

Description | Bayesian neural network priors - ETHZ |

Organisation | ETH Zurich |

Department | Department of Computer Science |

Country | Switzerland |

Sector | Academic/University |

PI Contribution | I contributed research ideas, wrote a good part of the research code, conducted some experiments, interpreted results, and wrote part of the final paper. |

Collaborator Contribution | My collaborators Vincent Fortuin and Gunnar Rätsch did much of the same: contribute research ideas, conducted and interpreted experiments, and wrote part of the paper. They also contributed hours in a computing cluster. |

Impact | The papers "Exact Langevin dynamics with stochastic gradients" and "Bayesian Neural Network Priors Revisited" |

Start Year | 2020 |

Description | Bayesian neural network priors - Imperial College |

Organisation | Imperial College London |

Department | Department of Computing |

Country | United Kingdom |

Sector | Academic/University |

PI Contribution | I provided research ideas, experimental code and writing. My research group, the Machine Learning Group at Cambridge, provided computing resources. |

Collaborator Contribution | Dr. Mark van der Wilk collaborated with research ideas, interpreting experiments, and paper writing. |

Impact | The papers "Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks" and "Bayesian Neural Network Priors Revisited" |

Start Year | 2019 |