# Probabilistic programming for Bayesian nonparametrics

Lead Research Organisation:
University of Oxford

Department Name: Statistics

### Abstract

My research will be focused on developing and applying statistical machine learning techniques. Statistical machine learning is concerned with modelling and implementation. From a modelling perspective, I will work on models which are suitable for a heterogenous data such as data that arises in cognitive science. I am also interest in how we can model networks and graphs arising in real-world settings (social networks, ontological graphs, etc). Bayesian nonparametric models are a large class of such models, and I am fortunate to be working with in a department which is at the forefront of research in Bayesian nonparametrics. Training a Bayesian nonparametric model can be complex, both in development time and computation time. On the implementation side of my project, I will research the applicability of probabilistic programming to Bayesian nonparametric models. Probabilistic programming is a rapidly developing area that spans computer science, engineering, statistics and machine learning. Whilst 'generation 1' languages such as Church were primarily for research purposes only, more modern probabilistic languages like STAN, Pyro and Edward offer high- performance probabilistic inference on a range of models. There is substantial uptake and development within industry (a notable example being Uber's recent investment in probabilistic programming). Existing probabilistic languages have been squarely focused on probabilistic inference.

Whilst my initial work (see 'Sampling and inference for discrete random probability measures in probabilistic programs' by Bloem-Reddy, Mathieu, Foster, Rainforth, Teh, Ge, Lomeli and Ghahramani) has shown that Bayesian nonparametric models can be expressed in an existing probabilistic programming language (in our case, Turing), there are substantial obstacles to a full use of probabilistic programming for Bayesian nonparametrics. One important area that I will examine is the existence of certain symmetries in the model. Consider Bayesian nonparametrics for sparse networks (see 'Sparse graphs using exchangeable random measures' by Caron and Fox). There is a high degree of symmetry in the model due to the finite edge exchangeability property. However, when naively expressing such a model in a probabilistic language, the symmetry is broken leading to severely diminished inference performance. Closely related to this is the treatment in probabilistic programming of random variable that change vary stochastically in dimension, or (equivalently) variables whose existence is stochastic. The majority of probabilistic languages in current use either explicitly exclude or perform very poorly in such settings. But these are precisely the settings needed for probabilistic programming to be useful for Bayesian nonparametric models. A second objective of the project is to take probabilistic programming beyond inference. Model checking, model selection, applications to reinforcement learning and control are all exciting areas for probabilistic programming that I intend to tackle.

In terms of potential impact, the most pressing use case of probabilistic programming is in the field of autonomous vehicles.

Recent investment by Uber and Toyota in the area indicates the burning need of industry to make probabilistic programming work at scale on sophisticated models. Probabilistic programming, though, is a very powerful and general tool. Data from networks (social and otherwise) now pervades society and efficient analysis of such data will be essential for the next generation of policy makers and data analysts.

Probabilistic programming will be a major tool in their arsenal.

Thematically, this project falls within the following EPSRC research

areas: Artificial intelligence technologies, Information systems, and Statistics and applied probability.

Whilst my initial work (see 'Sampling and inference for discrete random probability measures in probabilistic programs' by Bloem-Reddy, Mathieu, Foster, Rainforth, Teh, Ge, Lomeli and Ghahramani) has shown that Bayesian nonparametric models can be expressed in an existing probabilistic programming language (in our case, Turing), there are substantial obstacles to a full use of probabilistic programming for Bayesian nonparametrics. One important area that I will examine is the existence of certain symmetries in the model. Consider Bayesian nonparametrics for sparse networks (see 'Sparse graphs using exchangeable random measures' by Caron and Fox). There is a high degree of symmetry in the model due to the finite edge exchangeability property. However, when naively expressing such a model in a probabilistic language, the symmetry is broken leading to severely diminished inference performance. Closely related to this is the treatment in probabilistic programming of random variable that change vary stochastically in dimension, or (equivalently) variables whose existence is stochastic. The majority of probabilistic languages in current use either explicitly exclude or perform very poorly in such settings. But these are precisely the settings needed for probabilistic programming to be useful for Bayesian nonparametric models. A second objective of the project is to take probabilistic programming beyond inference. Model checking, model selection, applications to reinforcement learning and control are all exciting areas for probabilistic programming that I intend to tackle.

In terms of potential impact, the most pressing use case of probabilistic programming is in the field of autonomous vehicles.

Recent investment by Uber and Toyota in the area indicates the burning need of industry to make probabilistic programming work at scale on sophisticated models. Probabilistic programming, though, is a very powerful and general tool. Data from networks (social and otherwise) now pervades society and efficient analysis of such data will be essential for the next generation of policy makers and data analysts.

Probabilistic programming will be a major tool in their arsenal.

Thematically, this project falls within the following EPSRC research

areas: Artificial intelligence technologies, Information systems, and Statistics and applied probability.

## People |
## ORCID iD |

Yee Whye Teh (Primary Supervisor) | |

Adam Foster (Student) |

### Publications

Bloem-Reddy B
(2018)

*Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks*
Foster A
(2019)

*Variational Bayesian Optimal Experimental Design*
Foster Adam
(2019)

*A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments*in arXiv e-prints### Studentship Projects

Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|

EP/N509711/1 | 01/10/2016 | 30/09/2021 | |||

1963632 | Studentship | EP/N509711/1 | 01/10/2017 | 31/03/2021 | Adam Foster |

Description | The first component of work conducted during this DPhil was on Bayesian nonparametric statistics and their connection to probabilistic programming. Bayesian nonparametrics are a flexible class of statistical models that have many attractive theoretical features. Doing inference in these models can be challenging. In our work ' Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks', we established a more efficient method for performing inference in a certain class of Bayesian nonparametric models. These were models for network data - for example they might be applied to analysing connectivity, influence and community structure in social networks. Probabilistic programming allows users to express Bayesian models (including Bayesian nonparametric ones) and perform inference on them automatically. Unfortunately, due to their infinite dimensional nature, probabilistic languages are not particularly suited to Bayesian nonparametric models. In our work 'Sampling and inference for discrete random probability measures in probabilistic programs' we established methods to more efficiently work with Bayesian nonparametric models (specifically random probability measures) in probabilistic languages. Based on this and other work, a number of probabilistic languages now emphasise Bayesian nonparametrics as a key requirement of a universal probabilistic language (cf https://pyro.ai/examples/dirichlet_process_mixture.html). Whilst performing inference (learning from existing data) is the primary focus of most statistical research and most probabilistic languages, it is important to consider how the data is obtained in the first place. Specifically, if data is obtained by experimentation, how should we design that experiment? Designing Bayesian optimal experiments is very important if we hope to obtain useful data at reasonable cost, but can be computationally intractable in all but the simplest cases. In our work ' Variational Optimal Experiment Design: Efficient Automation of Adaptive Experiments' (workshop paper) and ' Variational Bayesian Optimal Experimental Design' (accepted for spotlight presentation at NeurIPS 2019) we showed that variational methods and amortization could substantially speed up optimal experimental design in many cases. This expanded the scope of models for which one can design Bayesian optimal experiments. This work has far-reaching potential in every scientific discipline that uses Bayesian data analysis: psychology, political science and pharmacology to name but three. To help with the adoption of our methods, we incorporate our new methods in the probabilistic language Pyro. This means that any researcher using Pyro to conduct their data analysis now has access to a powerful set of tools to design optimal experiments. To further extend the scope of Bayesian optimal experiments to very large design spaces, we consider an extension to our previous work that learned optimal designs by gradient descent. This was published in ' A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments'. The methodology developed here further expands the scope of models and design problems in which one might feasible be able to compute the Bayesian-optimal design. There is a strong connection between designing Bayesian-optimal experiments and learning informative representations. Recently, we have been working on the connection between these two important fields in machine learning. |

Exploitation Route | Optimal experimental design has many potential future applications. We are currently looking at potential applications of our methodology in psychology, political science and pharmacology. The hope would be that researchers can use our methods to design experiments that allow them to reach the same conclusions with less money, time, or human effort expended. |

Sectors | Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice,Pharmaceuticals and Medical Biotechnology |

URL | http://csml.stats.ox.ac.uk/people/foster/ |

Title | Pyro Optimal Experiment Design |

Description | I contributed to the open source probabilistic programming language Pyro. Pyro is a tool that allows researchers to quickly express Bayesian models in Python and perform inference on those models automatically using a wide range of techniques using PyTorch on the backend. My contribution was the optimal experiment design in Pyro, which takes probabilistic models expressed in Pyro and computes optimal experiment designs for those models. |

Type Of Technology | Software |

Year Produced | 2019 |

Open Source License? | Yes |

Impact | This software was used in 'Variational Bayesian Optimal Experimental Design' and 'A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments'. |

URL | http://docs.pyro.ai/en/stable/contrib.oed.html |