Robustness of Invertible Neural Networks

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

Constructing histograms or probability distributions in low dimensions is very simple, and as humans we are able to visualise them in our heads in one, two or maybe even three dimensions. However, the real world is often not constrained to low dimensions, and the probability of an event occurring can be dependent on a large number of factors, resulting in a high-dimensional probability distribution. However, constructing such a histogram in a standard way is often not possible due to the high dimensionality of the problem.
Invertible Neural Networks (INN) and Normalising Flows (NFs) offer a way to evaluate the likelihoods of complex high-dimensional distributions. This is achieved by learning a non-linear bijective mapping between a simple latent parametric distribution (which is simple, and we know) and the true distribution (which we don't know and is hard to model). The likelihood of a data sample is then equal to the likelihood of the mapped data point on the simple latent distribution scaled by a normalising constant. Thus, rather than measuring the likelihood on the true distribution, we instead map it to a latent distribution where we can evaluate it easily and then scale it appropriately. Furthermore, INNs allow us to sample from the latent space and perform the inverse mapping to generate data samples.
Whilst the theory underpinning these methods is attractive, several open issues remain regarding their usability and accuracy. Addressing these issues will form the core of my thesis and the majority or my research.
The first major problem is calculating the normalising constant needed to evaluate the exact likelihood of the data. Naïve calculation is computationally prohibitively expensive, and as such methods to speed up the computation need to leveraged. Standard methods enforce a variety of constraints on the network, potentially harming the expressivity of the mapping. Addressing this issue is the first major objective for my thesis, significant progress has already been made and a method addressing the above issues has been implemented and is awaiting publication.
The second issue is enforcing the invertibility of the mapping (to generate samples), this issue forms part of a much broader picture of inferring in input for a given output. For this application, various optimization methods can be employed to obtain the input, however they often do not work well and sometimes not all. This is currently forming the majority of time and preliminary methods have been tested, but none that work successfully across all possible mappings.
Thirdly, I will be constructing a thorough investigation into the stability and robustness of INNs and NFs. Recent work suggests that these methods may not be as accurate as they seem. As such part of my ongoing work is to investigate, where and how these methods fail. Thus, allowing us to potentially compensate for their pitfalls.
The above three points are currently forming the majority of my research and project work. They are all clearly linked and well-motivated from the literature. Addressing all of them will provide a significant contribution to the community.

Publications

10 25 50
publication icon
Joy T (2019) Efficient Relaxations for Dense CRFs with Sparse Higher-Order Potentials in SIAM Journal on Imaging Sciences

publication icon
Tonioni A (2019) Learning to Adapt for Stereo

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
1942898 Studentship EP/N509711/1 01/01/2018 30/06/2021 Thomas Joy
 
Description Several objectives and findings have been discovered during the course of my PhD.

Firstly is the use of optimisation minimisation methods to perfectly segment the boundary of an object in an image. This objective is referred to as semantic segmentation, which is the process of classifying every pixel in an image. This results in near-perfect segmentations of objects in images, which is a crucial component for robots navigating a real-world environment.

Secondly is the introduction of an algorithm which teaches a stereo vision (depth perception from two images) neural network to quickly adapt and learn how to predict depth in a new environment. This is achieved by teaching the algorithm how to adapt to new environments in a simulation. Specifically, we learn and initialisation for the parameters, which, at test time, can quickly adapt to new situations.
Exploitation Route Both findings can be built on in several ways. Firstly semantic segmentation could possibly be incorporated into a deep learning framework, potentially allowing more flexibility in learning the associated parameters. This could lead to significant improvements in accuracy, however, it is not initially clear how one would do this. Secondly, more sophisticated 'learning to learn' methods could be employed. Potentially leading to faster adaptation and higher accuracy.
Sectors Digital/Communication/Information Technologies (including Software)