Novel Markov chain Monte Carlo methods for high-dimensional statistics.

Lead Research Organisation: University of Oxford

Department Name: Statistics

Abstract

Markov chain Monte Carlo (MCMC) are the tools of choice to explore complex non-standard probability distributions.
These algorithms have been introduced over 60 years ago, yet it remains a very active research area as we
now face increasingly difficult challenges. Namely it is now expected for these algorithms to work in high-dimensional settings
and in the presence of very large datasets.

The aims and objectives of this project is to address these challenges by developing novel Markov chain Monte Carlo (MCMC)
algorithms which scale to high-dimensional scenarios in a data rich enviromnent. A sharp theoretical analysis of these novel MCMC schemes
will also been provided and they will be demonstrated on a variety of challenging statistical applications.

Much work has been recently done on the analysis of the unadjusted Langevin algorithm in scenarios where the target distributions are log-concave.
However, the log-concavity assumption is very restrictive and the unadjusted Langevin algorithm introduces some undesirable bias.
We will develop novel schemes which provide consistent estimates and will aim to develop a theoretical analysis that bypasses the log-concavity assumption.
In particular, we plan to focus on the development of non-reversible schemes.

The longer term benefits of this project are also closely linked to the RCUK Digital Economy programme. MCMC are widely used to analyze complex datasets and can be used to develop novel collaborative
filtering and topic modelling techniques for example. It is thus expected that benefits will be experienced in the medium term by the general
public; e.g. the development of more powerful search engines and recommender systems, better credit card scoring techniques, improved
methods for identity fraud detection etc. Many aspect of computational finance could also readily benefit from them.

Student:

Soufiane Hayou

Period of Study:

Oct 17 - Sep 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1929843

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Arnaud Doucet (Primary Supervisor)
Soufiane Hayou (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509711/1			01/10/2016	30/09/2021
1929843	Studentship	EP/N509711/1	01/10/2017	30/09/2020	Soufiane Hayou

Key Findings


Description	Deep neural networks have become extremely popular as they achieve state-of-the-art performance on a variety of important applications including language processing and computer vision. The success of these models has motivated the use of increasingly deep networks and stimulated a large body of work to understand their theoretical properties. To cite a few results in this direction, Montufar et al. (2014) have shown that neural networks have exponential expressive power with respect to the depth while Poole et al. (2017) obtained similar results using a topological measure of expressiveness. This high expressive power combined with the advances in the optimization algorithms (Duchi et al. (2011)) have made deep neural networks the model of choice for many tasks. Our work is divided into three chapters : - In the first chapter, we explore the theoretical properties of large neural networks. We particularly discuss the concept of 'information propagation' through a neural networks, which gives insights on how inputs propagates inside the network. We introduce a large class of 'activation functions' that, according to our analysis, provides better information propagation compared to standard functions. - In the second chapter, we discuss the problem of training deep neural networks. Jacot et al. (2018) showed that training neural networks can be described with a mathematical tool named Neural Tangent Kernel (NTK). We bridge the gap between the two previous concepts of Neural Tangent Kernel and Initialization for Deep Neural Networks. More precisely, we study the impact of the initialization and the activation function on the NTK as the network becomes very deep. We prove that only an initialization known as the Edge of Chaos leads to efficient training. - In the third chapter, we derive principled guidelines for neural networks compression. Pruning Deep Neural Networks has recently attracted the interest of many researchers. Indeed, reducing the size of Deep Neural Networks while keeping the performance is crucial for the use of such models on devices with limited computational power. While most Pruning Algorithms rely on pre-trained neural networks (i.e. the algorithm prune an already trained network), a new direction of Pruning Neural Networks at Initialization has shown some promising results (Lee et al. (2018), Wang et al. (2019)). These works suggest pruning the network at Initialization and train the resulting sparse network. In the third Chapter, we give a comprehensive analysis of Pruning at initialization and derive principled guidelines in order to avoid structural anomalies in the resulting sparse network.
Exploitation Route	The choice of Initialization is crucial for the training of deep neural networks. Our research paper "On the Impact of the Activation Function on Deep Neural Networks Training" published in the International Conference on Machine Learning 2019 (ICML 2019) shows that combining an initialization known as the Edge of Chaos with a class of smooth activation functions leads to efficient training of deep models. We believe this might have a big impact on the use of such models in practice, e.g. in image recognition (auto-driving cars, medical imaging etc) and language processing (voice recognition, automatic translation etc) as it could significantly enhance the performance of such models. Pruning neural networks ("Pruning untrained neural networks: Principles and Analysis" arXiv 2020 ) is usually a necessary step before implementing models on small devices. Our theory on the pruning of neural networks at initialization provides theoretical guidelines in order to achieve the best performance from the pruned model. We believe this work will help many practitioners in the design and finalisation of projects that use neural networks.
Sectors	Aerospace, Defence and Marine,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Energy,Financial Services, and Management Consultancy,Healthcare,Manufacturing, including Industrial Biotechology,Other

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects