Bayesian Deep Reinforcement Learning

Lead Research Organisation: University of Oxford
Department Name: Autonom Intelligent Machines & Syst CDT


Brief description of the context of the research including potential impact
Deep reinforcement learning has become ubiquitous for learning control policies in challenging environments such as robotic control, Go-playing and autonomous driving. However, standard approaches are often sample inefficient, unstable, and use ad-hoc tricks that are not theoretically well justified. This project looks at deriving principled new objectives and algorithms for deep reinforcement learning through a Bayesian lens, and new interpretations of existing reinforcement learning algorithms.

Aims and Objectives
- Extending existing methods in the Bayesian Reinforcement Learning via meta-learning in the Bayes Adaptive MDP Framework to more complex environments.
- Studying the connection between deep reinforcement learning and probabilistic inference and using this to derive a principled objective for reinforcement learning. We aim to bring theoretical justification for existing objectives such as mean-squared Bellman error which may not appropriately reflect the geometry of the function space.
- Scaling related methodologies such as Bayesian optimisation to novel settings.

Novelty of the research methodology
- Probabilistic treatment of reinforcement learning
- Novel and newly proposed frameworks for reinforcement learning such as BAMDPs and RL as inference

Alignment to EPSRC's strategies and research areas (which EPSRC research area the project relates to) Further information on the areas can be found on
- Artificial intelligence technologies
- Robotics
- Statistics and applied probability

Any companies or collaborators involved


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024050/1 01/10/2019 31/03/2028
2243850 Studentship EP/S024050/1 01/10/2019 30/09/2023 Cong Lu