Optimisation Methods for Optimal Transport

Lead Research Organisation: University of Oxford

Abstract

Optimal Transport is an elegant area of mathematics relating to the study of transporting the mass from one probability distribution to another in the most 'efficient' manner, with respect to a particular choice of cost function. It provides a principled mechanism for a cost function on the underlying space to induce a measure of distance between probability distributions on this space. Such problems occur frequently in machine learning systems, either as a loss function or to find an optimal mapping between distributions. While the mathematical theory of optimal transport has been well-developed over the past few decades, data-driven applications have become increasingly relevant due to recent computational advances that have allowed for tractable calculation of optimal mappings. Recent applications of optimal transport in machine learning systems have include methods for image generation, aligning single-cell data, and natural language processing, with many exciting applications yet to be explored.

The solution to the optimal transport problem is highly dependent on the choice of underlying cost function. The majority of applications of computational optimal transport only consider using the standard quadratic cost, which is often an arbitrary choice and may not be a good cost function for the problem at hand. In this project, we aim to leverage both classical optimal transport theory and recent computational advances to design methods that can learn improved cost functions from observed data in a principled manner, which can thus lead to improved performance in subsequent downstream tasks. We will use elements of statistical learning theory to provide convergence guarantees, ensuring that our methods are both reliable and efficient.

An application of optimal transport that could benefit from such methods is the analysis of single-cell omics data, which consists of measurements taken of individual cells in a population at a cost of destroying the cell in the process. The data is therefore recorded only as unlabelled snapshots that approximate the entire population. Optimal transport methods can be used to align the observed distributions, allowing the trajectories of individual cells in the population to be inferred. Given that the cell measurements are recorded according to a particular vector embedding, a good choice of cost function for this representation is not clear, so the ability to learn a cost function from existing data could enable improved performance. The ability to learn cost functions that are adapted for the problem at hand could be beneficial in many other applications of optimal transport and for a variety of different data structures, whenever the choice of cost function for the underlying space is unclear.

As computational optimal transport methods play an important role in many machine learning systems, this proposal falls within the EPSRC's 'AI, Digitalisation and Data: Driving Value and Security' research priority.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2740715 Studentship EP/S023151/1 01/10/2022 30/09/2026 Samuel Howard