Bayesian and Kernel Methods for Nonparametric Statistical Models

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

The advent of high dimensional datasets both with respect to the data size, i.e. the number of observations, and with respect to the feature space, i.e. the number of quantities measures for each observation, has led to an increased demand for machine learning algorithms that deal with such datasets. One might think of a database storing various traits and genetic markers corresponding to thousands or millions of patients. The goal could be to discover relations between these markers and the susceptibility to a certain disease.

The mathematical description of such situations proves challenging in many ways. One might be able to find a suitable statistical model, but the data size renders the problem computationally intractable, or alternatively, the statistical model to describe the situation is too complex to even allow the application of classical inference techniques.

Flexible frameworks to approach these problems arise in both Bayesian and frequentist statistical machine learning. Due to the intractability of exact inference in complex models, various approximation techniques have been developed in both communities in recent years to tackle these large scale inference problems. This project will focus on exploring various aspects of scalable machine learning procedures in both the Bayesian and frequentist paradigm. It will further theoretical understanding of approximate computation techniques in nonparametric statistical modelling with (deep) Gaussian processes and other (deep) probabilistic models. In particular, we propose to investigate the approaches based on variational approximations and their properties. The goal is that the theoretical research, equivalent reformulations and a deeper understanding will reveal new avenues for improved approximate inference machine learning algorithms, as well as providing sharp guarantees for their performance. To achieve this goal, we shall analyze the interface between the Bayesian and frequentist kernel methods, and in particular, the theory of kernel mean embedding will be of key interest in the project.

The problem will be approached from first principles, using the foundational concepts from measure theory, functional analysis and probability theory. Modern methodologies that will be applied are information theoretic concepts and the theory of reproducing kernel Hilbert spaces. To allow for wider applicability of newly developed methodologies, new open source software libraries will be built or functionality will be added to the existing ones. The research is foundational in its nature and has a potential to reach a wide range of applications since machine learning algorithms and nonparametric modelling are becoming widely adopted in digital economy, engineering, scientific studies, and healthcare.

This project falls within the EPSRC Mathematical Sciences research area and, specifically, into the field of statistics and applied probability.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513295/1 01/10/2018 30/09/2023
2444047 Studentship EP/R513295/1 01/10/2020 30/09/2023 Veit David Wild