Asymptotic limits of deep learning architectures

Lead Research Organisation: University of Oxford

Abstract

1 Project description
Recent developments in neural networks have proven very successful in applications ranging from image recognition and segmentation to mastering ancient games like chess and Go. With the help of lighting-fast graphics processing units (GPU) originally developed to efficiently parallelize matrix operations, neural network with millions of parameters can now be trained. It challenges the traditional results of mathematical statistics such as the need for regularization to prevent over-fitting and in the bias-variance trade-off analysis, with the emergence of the double descent phenomenon. These observations are well-documented in practical experiments, but still lack a solid mathematical explanation, as a direct analysis is often too intricate to reveal any peculiarity of these black-box models.
To alleviate these issues, we study asymptotic limits of neural network architectures. For example, we let the depth of a residual neural network tend to infinity, and study how it relates to the field of stochastic differential equations. In another vein, we let the number of parallel independent models tend to infinity and study the so-called mean-field limit of this aggregation. We already developed interesting connections and prove preliminary results about the relationship between the architectures and their asymptotic limit.

2 Aims and objectives
We aim to understand the practical success of deep learning methods via the study of asymptotic limits of neural network architectures. In this regime, neural networks become simpler objects that have been well-studied in the literature, such as kernels, stochastic differential equations or mean-field models. The objective is the development of a mathematical theory to link a particular architecture to its asymptotic limit, where we aim to explain empirically-observed phenomena such as the implicit regularization and the role of the endogenous noise in stochastic gradient descent.

3 Original work
This line of work has only been started a few years ago, and is still in its infancy despite a growing interest from the deep learning community. There is still no unified theory that bridges neural networks to their asymptotic counterparts, even for simple architectures, so the work we are pursuing is original and novel.

4 EPSRC research area
By decreasing degree of relevance, this project falls within the following EPSRC research areas: Statistics and applied probability, Artificial intelligence technologies, Mathematical analysis, Theoretical Computer Science, Numerical analysis, Non-linear systems, Operational Research.

5 Collaborators
I have an ongoing collaboration with InstaDeep Ltd, with whom I am working on a practical application of the theoretical work I am doing in Oxford. In particular, we are exploring alternative learning methods for residual neural networks using their continuous-depth limits and tools from stochastic control theory.

Planned Impact

Probabilistic modelling permeates the Financial services, healthcare, technology and other Service industries crucial to the UK's continuing social and economic prosperity, which are major users of stochastic algorithms for data analysis, simulation, systems design and optimisation. There is a major and growing skills shortage of experts in this area, and the success of the UK in addressing this shortage in cross-disciplinary research and industry expertise in computing, analytics and finance will directly impact the international competitiveness of UK companies and the quality of services delivered by government institutions.
By training highly skilled experts equipped to build, analyse and deploy probabilistic models, the CDT in Mathematics of Random Systems will contribute to
- sharpening the UK's research lead in this area and
- meeting the needs of industry across the technology, finance, government and healthcare sectors

MATHEMATICS, THEORETICAL PHYSICS and MATHEMATICAL BIOLOGY

The explosion of novel research areas in stochastic analysis requires the training of young researchers capable of facing the new scientific challenges and maintaining the UK's lead in this area. The partners are at the forefront of many recent developments and ideally positioned to successfully train the next generation of UK scientists for tackling these exciting challenges.
The theory of regularity structures, pioneered by Hairer (Imperial), has generated a ground-breaking approach to singular stochastic partial differential equations (SPDEs) and opened the way to solve longstanding problems in physics of random interface growth and quantum field theory, spearheaded by Hairer's group at Imperial. The theory of rough paths, initiated by TJ Lyons (Oxford), is undergoing a renewal spurred by applications in Data Science and systems control, led by the Oxford group in conjunction with Cass (Imperial). Pathwise methods and infinite dimensional methods in stochastic analysis with applications to robust modelling in finance and control have been developed by both groups.
Applications of probabilistic modelling in population genetics, mathematical ecology and precision healthcare, are active areas in which our groups have recognized expertise.

FINANCIAL SERVICES and GOVERNMENT

The large-scale computerisation of financial markets and retail finance and the advent of massive financial data sets are radically changing the landscape of financial services, requiring new profiles of experts with strong analytical and computing skills as well as familiarity with Big Data analysis and data-driven modelling, not matched by current MSc and PhD programs. Financial regulators (Bank of England, FCA, ECB) are investing in analytics and modelling to face this challenge. We will develop a novel training and research agenda adapted to these needs by leveraging the considerable expertise of our teams in quantitative modelling in finance and our extensive experience in partnerships with the financial institutions and regulators.

DATA SCIENCE:

Probabilistic algorithms, such as Stochastic gradient descent and Monte Carlo Tree Search, underlie the impressive achievements of Deep Learning methods. Stochastic control provides the theoretical framework for understanding and designing Reinforcement Learning algorithms. Deeper understanding of these algorithms can pave the way to designing improved algorithms with higher predictability and 'explainable' results, crucial for applications.
We will train experts who can blend a deeper understanding of algorithms with knowledge of the application at hand to go beyond pure data analysis and develop data-driven models and decision aid tools
There is a high demand for such expertise in technology, healthcare and finance sectors and great enthusiasm from our industry partners. Knowledge transfer will be enhanced through internships, co-funded studentships and paths to entrepreneurs

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023925/1 01/04/2019 30/09/2027
2272535 Studentship EP/S023925/1 01/10/2019 17/05/2023 Alain Rossier