Stochastic Numerics for Sampling on Manifolds
Lead Research Organisation:
University of Nottingham
Department Name: Sch of Mathematical Sciences
Abstract
The digital era has led to the increasing availability of highly-structured data such as social media graphs and networks, ratings and recommender system data from online retail and streaming platforms, and high-resolution medical images. Such data are characterised by non-trivial constraints (not everyone but only friends and family form a group within a network; shape of the imaged brain is unchanged under rotations of the image), and the sheer scale and complexity associated with storing and analysing such data necessitate the use of probabilistic models to mimic the manner in which the data were generated.
Fundamental to successful practical use of probabilistic models for highly-structured data is sampling, or generating random data, from geometrically constrained spaces known as manifolds. State-of-the-art in efficient sampling, backed by theoretical guarantees, within this nascent area is restricted to cases where the manifold is smooth without a boundary or the sampling distribution belongs to a class that is particularly amenable for theoretical analysis. This excludes many important problems one routinely encounters in AI and statistical applications, including low-rank matrix completion (predicting user ratings for Netflix movies) and analysing shapes of objects (computing a representative tumour shape from medical images).
To this end, the overarching goal of this timely project is to develop and analyse methods to sample from a general class of manifolds and distributions using ergodic stochastic differential equations. Positioned at the interface of stochastics, numerical analysis and geometry, the project will make a major contribution to the advancement of numerical methods for SDEs on manifolds and thus open up the possibility to efficiently analyse complex, geometric data.
Fundamental to successful practical use of probabilistic models for highly-structured data is sampling, or generating random data, from geometrically constrained spaces known as manifolds. State-of-the-art in efficient sampling, backed by theoretical guarantees, within this nascent area is restricted to cases where the manifold is smooth without a boundary or the sampling distribution belongs to a class that is particularly amenable for theoretical analysis. This excludes many important problems one routinely encounters in AI and statistical applications, including low-rank matrix completion (predicting user ratings for Netflix movies) and analysing shapes of objects (computing a representative tumour shape from medical images).
To this end, the overarching goal of this timely project is to develop and analyse methods to sample from a general class of manifolds and distributions using ergodic stochastic differential equations. Positioned at the interface of stochastics, numerical analysis and geometry, the project will make a major contribution to the advancement of numerical methods for SDEs on manifolds and thus open up the possibility to efficiently analyse complex, geometric data.
Organisations
Description | This one-year grant has resulted in several outputs, which are at different stages of review. In particular, we have prepared and submitted the work "Sampling and estimation on manifolds using the Langevin diffusion" by Karthik Bharath, Alexander Lewis, Akash Sharma, and Michael V Tretyakov. The following have been achieved in that paper. Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion with invariant measure on a compact Riemannian manifold. Two estimators of linear functionals with respect to the invariant measure based on the discretized Markov process are considered: a time-averaging estimator based on a single trajectory and an ensemble-averaging estimator based on multiple independent trajectories. Imposing no restrictions beyond a nominal level of smoothness on the invariant measure, first-order error bounds, in discretization step size, on the bias and variances of both estimators are derived. The order of error matches the optimal rate in Euclidean and flat spaces, and leads to a first-order bound on distance between the invariant measure ยต? and a stationary measure of the discretized Markov process. Generality of the proof techniques, which exploit links between two partial differential equations and the semigroup of operators corresponding to the Langevin diffusion, renders them amenable for the study of a more general class of sampling algorithms related to the Langevin diffusion. Conditions for extending analysis to the case of non-compact manifolds are discussed. Numerical illustrations with distributions, log-concave and otherwise, on the manifolds of positive and negative curvature elucidate on the derived bounds and demonstrate practical utility of the sampling algorithm. |
Exploitation Route | As planned, we established a productive collaboration between stochastic numerical analysts Michael Tretyakov and stochastic geometer Karthik Bharath. We, together with our postdocs, completed one work which is the starting point for our joint research on geometric integrators for stochastic differential equations on manifolds with numerous applications in sampling, optimisation and machine learning. |
Sectors | Other |