Massively Parallel Bayesian Inference Techniques

Lead Research Organisation: University of Bristol
Department Name: Mathematics

Abstract

An incredibly wide range of research areas rely on statistical inference to obtain and confirm results along with corresponding degrees of certainty. Popular techniques include Hamiltonian Monte Carlo (HMC), Markov Chain Monte Carlo (MCMC) and importance sampling. Whilst MCMC and importance sampling are commonly used for statistical inference, being generally fast to perform, they often scale poorly when used on large models, failing to converge to correct results. HMC, on the other hand, tends to be more popular as it scales better with model size. However, it is often slow to converge to the (albeit correct) results. We aim to present Bayesian inference methods, largely based on importance sampling, which allow us to arrive at correct results on large models in reasonable time.

Due to the general setting of this project, there is a wide potential scope for its impact. In particular, statistical inference is often performed by researchers via probabilistic programming languages such as Stan or PyMC. One long term goal of this project is to release a probabilistic programming language which employs our "massively parallel" techniques to perform this inference both quickly and correctly, with a familiar Pythonic interface.

These techniques work broadly by exploiting conditional independencies in statistical models through an efficient and tractable method for effectively considering an exponential number of samples. This is done quickly in practice thanks in large part to the parallelism available when performing the required operations on GPUs. It is for this reason that we refer to our family of methods as "massively parallel" methods. The use of an exponential number of samples (simplifying here somewhat to ignore certain crossovers and redundancies) aligns with a recent result which indicates that as the number of variables in a model increases linearly, the number of samples required for importance sampling to work increases exponentially.

This work presents a novel approach that can be applied to a wide range of pre-existing inference methods and has already been shown to lead to favourable results in certain models (particularly models with many conditional independencies between variables, such as with a largely hierarchical structure). For instance, a massively parallel version of the Reweighted Wake-Sleep (RWS) algorithm (originally designed to train Bayesian neural networks) has already been developed and shown to work well in several cases.

Alongside the long-term aim of a probabilistic programming language, the more immediate goals of this project are to continue developing an algorithm for fast and correct Bayesian posterior learning based on importance-weighted posterior moment estimates and to work on bringing the "massively parallel" approach to other inference algorithms.

This project falls within the EPSRC Statistics and Applied Probability research area.

Planned Impact

The COMPASS Centre for Doctoral Training will have the following impact.

Doctoral Students Impact.

I1. Recruit and train over 55 students and provide them with a broad and comprehensive education in contemporary Computational Statistics & Data Science, leading to the award of a PhD. The training environment will be built around a set of multilevel cohorts: a variety of group sizes, within and across year cohort activities, within and across disciplinary boundaries with internal and external partners, where statistics and computation are the common focus, but remaining sensitive to disciplinary needs. Our novel doctoral training environment will powerfully impact on students, opening their eyes to not only a range of modern technical benefits and opportunities, but on the power of team-working with people from a range of backgrounds to solve the most important problems of the day. They will learn to apply their skills to achieve impact by collaborative working with internal and external partners, such as via our Rapid Response Teams, Policy Workshops & Statistical Clinics.

I2. As well as advanced training in computational statistics and data science, our students will be impacted by exposure to, and training in, important cognate topics such as ethics, responsible innovation, equality, diversity and inclusion, policy, effective communication and dissemination, enterprise, impact and consultancy skills. It is vital for our students to understand that their training will enable them to have a powerful impact on the wider world, so, e.g., AI algorithms they develop should not be discriminatory, and statistical methodologies should be reproducible, and statistical results accurately and comprehensibly communicated to the general public and policymakers.

I3. The students will gain experience via collaborations with academic partners within the University in cognate disciplines, and a wide range of external industrial & government partners. The students will be impacted by the structured training programmes of the UK Academy of Postgraduate Training in Statistics, the Bristol Doctoral College, the Jean Golding Institute, the Alan Turing Institute and the Heilbronn Institute for Mathematical Sciences, which will be integrated into our programme.

I4. Having received an excellent training, the students will then impact powerfully on the world in their future fruitful careers, spreading excellence.

Impact on our Partners & ourselves.

I5. Direct impacts will be achieved by students engaging with, and working on projects with, our academic partners, with discipline-specific problems arising in engineering, education, medicine, economics, earth sciences, life sciences and geographical sciences, and our external partners Adarga, the Atomic Weapons Establishment, CheckRisk, EDF, GCHQ, GSK, the Office for National Statistics, Sciex, Shell UK, Trainline and the UK Space Agency. The students will demonstrate a wide range of innovation with these partners, will attract engagement from new partners, and often provide attractive future employment matches for students and partners alike.

Wider Societal Impact

I6. COMPASS will greatly benefit the UK by providing over 55 highly trained PhD graduates in an area that is known to be suffering from extreme, well-known, shortages in the people pipeline nationally. COMPASS CDT graduates will be equipped for jobs in sectors of high economic value and national priority, including data science, analytics, pharmaceuticals, security, energy, communications, government, and indeed all research labs that deal with data. Through their training, they will enable these organisations to make well-informed and statistically principled decisions that will allow them to maximise their international competitiveness and contribution to societal well-being. COMPASS will also impact positively on the wider student community, both now and sustainably into the future.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023569/1 01/04/2019 30/09/2027
2741375 Studentship EP/S023569/1 01/10/2022 18/09/2026 Samuel Bowyer