Machine Learning through the Lenses of Optimal Transportation
Lead Research Organisation:
Durham University
Department Name: Mathematical Sciences
Abstract
This research project proposes to investigate some analytical and numerical aspects of the training of wide multi-layered neural networks using the modern theory of optimal transportation. Optimal mass transportation is a relatively new area in mathematics, but it has received a huge momentum in the past twenty years or so, culminating in the 2018 Fields medal of A. Figalli. This theory has had great impact and influence on fields as partial differential equations, probability theory and statistics, geometry and data science, with important applications in meteorology, economics, biology and social sciences, to list a few.
Applied to machine learning, optimal mass transportation has had early successes, in the framework of supervised and unsupervised learning.
The results have quickly found industrial applications, such as models based on variational auto-encoders and generative adversarial networks.
The key objective in training a neural network is to identify a function that will accurately evaluate unseen data based on a given training set.
Unfortunately, identifying this prediction function is itself a challenging optimisation problem, which can be solved only approximately and with many constraints through numerical methods. In the past, this has been achieved through techniques such as back propagation and stochastic gradient descent. However, such methods do not scale well as the number of neurons increases, making them infeasible for many practical applications.
Thanks to recent developments in optimal transport, it is now possible to find prediction functions for infinitely wide single-layer neural networks.
This is done by solving partial differential equations derived from so-called Wasserstein gradient flows, a key mechanism in optimal transport.
Thus far, however, this mathematical framework is only applicable to single-layer neural networks, a key limitation acknowledged recently by Figalli et al.
To address this open problem, we seek to develop, as the backbone of this project, a mathematical framework to use Wasserstein gradient flows to find prediction functions on neural networks with two or more layers.
The tools of optimal transportation thus allow us to approximate the very-many-neuron problem, not by fewer neurons, but by a continuum (i.e. infinitely many) of them, reminiscent to the mean-field models in statistical physics. Our second objective is to quantify how good these approximations are, not only in the usual analytical sense of obtaining qualitative bounds that are often of little practical use, but also sharp bounds that align well with what is observed in applications. For this, we plan to use techniques from mean-field games (a new area initiated by another Fields medallist, P.-L. Lions, and his collaborators).
Numerical computations will form an important part of this project, in the first instance to test our mathematical framework as it develops (not whether it is correct, but how directly applicable it is to real-life problems). Existing computational techniques, however, could only handle discrete (if many) neurons.
Over the course of this project, as our third objective, we will need to develop numerical models, along with the requisite computational techniques, that couple discrete neurons with (an approximation of) a continuum of neurons. This will start with a single layer, progressing to multiple layers as we gain experience.
Applied to machine learning, optimal mass transportation has had early successes, in the framework of supervised and unsupervised learning.
The results have quickly found industrial applications, such as models based on variational auto-encoders and generative adversarial networks.
The key objective in training a neural network is to identify a function that will accurately evaluate unseen data based on a given training set.
Unfortunately, identifying this prediction function is itself a challenging optimisation problem, which can be solved only approximately and with many constraints through numerical methods. In the past, this has been achieved through techniques such as back propagation and stochastic gradient descent. However, such methods do not scale well as the number of neurons increases, making them infeasible for many practical applications.
Thanks to recent developments in optimal transport, it is now possible to find prediction functions for infinitely wide single-layer neural networks.
This is done by solving partial differential equations derived from so-called Wasserstein gradient flows, a key mechanism in optimal transport.
Thus far, however, this mathematical framework is only applicable to single-layer neural networks, a key limitation acknowledged recently by Figalli et al.
To address this open problem, we seek to develop, as the backbone of this project, a mathematical framework to use Wasserstein gradient flows to find prediction functions on neural networks with two or more layers.
The tools of optimal transportation thus allow us to approximate the very-many-neuron problem, not by fewer neurons, but by a continuum (i.e. infinitely many) of them, reminiscent to the mean-field models in statistical physics. Our second objective is to quantify how good these approximations are, not only in the usual analytical sense of obtaining qualitative bounds that are often of little practical use, but also sharp bounds that align well with what is observed in applications. For this, we plan to use techniques from mean-field games (a new area initiated by another Fields medallist, P.-L. Lions, and his collaborators).
Numerical computations will form an important part of this project, in the first instance to test our mathematical framework as it develops (not whether it is correct, but how directly applicable it is to real-life problems). Existing computational techniques, however, could only handle discrete (if many) neurons.
Over the course of this project, as our third objective, we will need to develop numerical models, along with the requisite computational techniques, that couple discrete neurons with (an approximation of) a continuum of neurons. This will start with a single layer, progressing to multiple layers as we gain experience.
Organisations
People |
ORCID iD |
Alpar Meszaros (Primary Supervisor) | |
Guy Parker (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/W524426/1 | 30/09/2022 | 29/09/2028 | |||
2744976 | Studentship | EP/W524426/1 | 30/09/2022 | 30/03/2026 | Guy Parker |