A Non-Asymptotic Analysis of Stochastic Mirror Descent for Non-Convex Learning

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Many state of the art machine learning techniques depend heavily on the optimisation of some non-convex objective. Often, this procedure is computationally expensive and requires techniques that employ stochastic approximations or regularisation using artificial noise. For instance, there are a variety of techniques in deep learning that use artificial noise applied to the data, parameters or update direction, all of which have been shown to encourage faster convergence and improved generalisation. Though many of these techniques have demonstrated their validity through repeated experimentation and testing, the theoretical framework to validate and understand them is still in its infancy.

Another technique which is used to overcome the aforementioned computational difficulties is the mirror descent framework, which does so by taking advantage of the known geometric properties specific to the problem in consideration. It is highly popular in convex optimisation as in this setting there are many provable advantages to this algorithm. Its stochastic counterpart, stochastic mirror descent, is rapidly gaining popularity in non-convex learning. But again, the theoretical framework in this setting is under-developed.

Through our project, we seek to develop a better understanding of the statistical and computational properties of these techniques in the large-scale learning setting. Specifically, we are interested in quantitively comparing the consistency and generalisation performance of these techniques and seeing how they change as the problem grows more challenging.

Guiding much of our methodology will be the focus on non-asymptotic analysis, that is compared to analyses in which quantities are taken to their infinite limits. We believe this is essential for applications in large-scale learning where only a limited number of iterations can take place and the effect of the number of parameters on performance is of key interest.

Fundamentally, our approach will be based on approximating the iterative optimisation procedure with stochastic processes that evolve continuously in time, specifically diffusion processes. With this, we gain access to the broad suite of powerful techniques for analysing diffusions. This approach has seen a growing popularity in recent years and has already yielded interesting results in the learning setting. The project will also draw heavily from the study of stochastic differential equations as well as high dimensional statistics.

This project falls within the EPSRC Statistics and Applied probability area.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513295/1 01/10/2018 30/09/2023
2444063 Studentship EP/R513295/1 01/10/2020 30/09/2024 Tyler Karim Farghly