An Exploration of Langevin Samplers and their Theoretical Guarantees
Lead Research Organisation:
Imperial College London
Department Name: Mathematics
Abstract
With the growing importance of generative sampling, I have developed a keen interest in the theoretical guarantees with which such algorithms may be equipped. In particular, my research has focused on different settings for Maximum Likelihood Estimation with Langevin diffusions, considering the learning problem as an optimisation problem and studying the non-asymptotic behaviour. This research produces, both guarantees for the convergence order of the algorithms, as well as allowing for a better understanding of how the regularity of the model and target influences the dynamics. Crucially, I am particularly interested in the simple setting, assuming that our model is both gradient log Lipschitz and logarithmically strongly convex. Indeed, these assumptions provide enough regularity to guarantee exponential convergence rates, yet are not yet fully studied in all contexts.
A wide spectrum of mathematical results may be applied to these problems, but often these lack clear dependency of parameters and the regularity of the system under study. Indeed, the study of SDEs is often done under much milder assumptions than those treated by me, but this often obfuscates the relationship between the regularity of the model and the rate of convergence. The simplicity of our setting has already motivated a wide variety of researchers to study more closely the behaviour of these Langevin Samplers and identify quantitative bounds for the sampling problem at hand. To exploit these robust results, a flurry of research has emerged, which I hope to actively contribute to.
To this end, I have become very interested in exploring annealing techniques and exploiting the growing literature of uniform-in-time bounds for multi-scale systems.
My work began by looking at the Maximum Marginal Likelihood Estimate (MMLE) problem, through the lens of Langevin samplers. In particular, for a normalised model, there is an optimisation-based sampling scheme, which can be shown to converge to the target MMLE as one anneals the process. Specifically, as such algorithms have already been established, my work was focused on accelerating the algorithms. A continuous time acceleration was identified, thanks to the extensive results on underdamped and overdamped Langevin samplers, which was studied, and from which it was possible to develop new, non-trivial algorithms, producing error bounds with better dependence on dimension, step-size and count. This work can be seen largely as an extension of other work in the area. However, these results help complete the picture of Langevin samplers for MMLE problems and establish algorithms with provably better rates than comparable algorithms.
My current work has focused on looking at Contrastive Divergence algorithms for the MLE problem, in an attempt to better quantify the error produced by the approximations made by this algorithm proposed by Hinton. Unlike in the previous case, this case does not assume that the model is un-normalised, yielding a much more involved analysis. Indeed, the algorithm proposed by Hinton makes approximations to yield tractable updates, but omits to analyse the emerging order as it was observed to be "negligible" in the examples presented. Though many of the results required for our analysis of the error term are available in the large literature of multi-scale systems, Poisson Equations and the associated semi-groups, we are particularly interested on identifying the dependence of these bounds on the problem setting and assumptions.
With this work I have, and hope to continue, establishing "quantitative" control on the non-asymptotic averaging regime. I intend to combine these works to provide a more complete picture of Langevin samplers for the MLE optimisation problem in the convex setting.
A wide spectrum of mathematical results may be applied to these problems, but often these lack clear dependency of parameters and the regularity of the system under study. Indeed, the study of SDEs is often done under much milder assumptions than those treated by me, but this often obfuscates the relationship between the regularity of the model and the rate of convergence. The simplicity of our setting has already motivated a wide variety of researchers to study more closely the behaviour of these Langevin Samplers and identify quantitative bounds for the sampling problem at hand. To exploit these robust results, a flurry of research has emerged, which I hope to actively contribute to.
To this end, I have become very interested in exploring annealing techniques and exploiting the growing literature of uniform-in-time bounds for multi-scale systems.
My work began by looking at the Maximum Marginal Likelihood Estimate (MMLE) problem, through the lens of Langevin samplers. In particular, for a normalised model, there is an optimisation-based sampling scheme, which can be shown to converge to the target MMLE as one anneals the process. Specifically, as such algorithms have already been established, my work was focused on accelerating the algorithms. A continuous time acceleration was identified, thanks to the extensive results on underdamped and overdamped Langevin samplers, which was studied, and from which it was possible to develop new, non-trivial algorithms, producing error bounds with better dependence on dimension, step-size and count. This work can be seen largely as an extension of other work in the area. However, these results help complete the picture of Langevin samplers for MMLE problems and establish algorithms with provably better rates than comparable algorithms.
My current work has focused on looking at Contrastive Divergence algorithms for the MLE problem, in an attempt to better quantify the error produced by the approximations made by this algorithm proposed by Hinton. Unlike in the previous case, this case does not assume that the model is un-normalised, yielding a much more involved analysis. Indeed, the algorithm proposed by Hinton makes approximations to yield tractable updates, but omits to analyse the emerging order as it was observed to be "negligible" in the examples presented. Though many of the results required for our analysis of the error term are available in the large literature of multi-scale systems, Poisson Equations and the associated semi-groups, we are particularly interested on identifying the dependence of these bounds on the problem setting and assumptions.
With this work I have, and hope to continue, establishing "quantitative" control on the non-asymptotic averaging regime. I intend to combine these works to provide a more complete picture of Langevin samplers for the MLE optimisation problem in the convex setting.
Organisations
People |
ORCID iD |
| Paul Valsecchi Oliva (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S023151/1 | 31/03/2019 | 29/09/2027 | |||
| 2891705 | Studentship | EP/S023151/1 | 30/09/2023 | 29/09/2027 | Paul Valsecchi Oliva |