General Theory of Implicit Regularization

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

In the era of Big Data---characterized by large, high-dimensional and distributed datasets---we are increasingly faced with the challenge of establishing scalable methodologies that can achieve optimal statistical guarantees under computational constraints. To fundamentally address this challenge, new paradigms need to be established. Over the past 50 years, statistical learning theory has relied on the framework of explicit regularization to control the model complexity of estimators. By design, this approach decouples notions of statistical optimality and computational efficiency and, in applications, often leads to expensive model selection procedures. This framework faces fundamental limitations to explain the practical success of modern machine learning paradigms, which are based on running simple gradient descent methodologies without any explicit effort to control model complexity.
Overcoming these limitations prompts for the investigation of the implicit regularization properties of iterative algorithms, namely the bias enforced as a by-product of the very choice of optimization routine and tuning parameters. Implicit regularization structurally combines statistics with optimization and it has the potential to promote the design of new algorithmic paradigms built around the notion of statistical and computational optimality. However, to fully realize its potential, several challenges need to be overcome. This project aims to develop a general theory of implicit regularization that can optimally address fundamental primitives in modern applications---e.g. involving sparse and low-rank noisy models, decentralized multi-agent learning, and adaptive and robust procedures---and establish novel cross-disciplinary connections with far-reaching consequences. This goal will be achieved by combining non-asymptotic tools for the study of random structures in high-dimensional probability with the general framework of mirror descent from optimization and online learning.

Publications

10 25 50