New approaches to training deep probabilistic models

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Probabilistic models are a mathematical representation of a given data set in terms of a
probability distribution. The probability distribution reveals information about the
frequency and correlation of features in the data set. The nuanced probabilistic
representation can subsequently be used in numerous downstream tasks such as the
generation of realistic synthetic data points, the classification of data, or the improvement
of medical imaging methods.
Deep neural networks, on the other hand, have significantly advanced the field of
machine learning. They allow computers to read hand-written notes, make predictions in
complex scenarios, or translate between languages. However, such networks are typically
not well-suited for learning probabilistic models because the output of neural networks
can not be interpreted as a probability. This has led to the development of unnormalised
models which allow the modelling via a deep neural network. However, training
unnormalised models is difficult, and most methods are not well-grounded in theoretical
works. Theoretical guarantees, however, are critical when applying the obtained
probabilistic representation in sensitive areas such as medicine.
My research focuses on new methodologies for the training and evaluation of deep
unnormalised probabilistic models. The aim of my research is two-fold. Firstly, I work
towards training methodologies that are derived from classical theoretical results in
statistics and probability theory. Such methods perform more reliably, and failure cases
that appear in conventionally used algorithms can be ruled out. Furthermore,
approximation errors remain tractable so that one can prove theoretical guarantees
missing in related works. Secondly, I aim to achieve competitive performance with
previous training methods for unnormalised models. In particular, I seek to make our
methodology scalable to high-dimensional data without compromising the quality of the
learned distribution. In our current numerical experiments, we observe that our method
captures non-linear dependencies in the data and generalises well when used to generate
synthetic data samples.
Since probabilistic models represent our knowledge about a system obtained from data,
my work is universally applicable in big-data settings. They are most known for their
capacity to generate realistic artificial images or to fill incomplete images. Probabilistic
models are also impactful in engineering applications where the quantity of interest has to
be inferred from values of a known forward process (so-called Bayesian inverse
problems). A prominent example of this is the reconstruction of a medical image from
measurements of scattered radiation. Additionally, probabilistic models can be used to
conceal sensitive or private information in data sets because they contain relevant
statistical information rather than individual data points. Hence, the probabilistic model
can be used to optimise processes while not revealing sensitive information such as
patient data in hospitals.
My current project is joint work with Dr Andrew B. Duncan (Improbable Worlds Limited
and Imperial College, London), Jen Ning Lim (University of Warwick), and Prof. Dr
Sebastian J. Vollmer (University of Kaiserslautern, DFKI

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/V520238/1 01/10/2020 31/10/2025
2613115 Studentship EP/V520238/1 02/10/2025 02/10/2025 Tobias Schroeder