Robust and scalable Bayesian inference under model misspecification

Lead Research Organisation: University of Warwick
Department Name: Statistics

Abstract

Inspired by recent advances in model misspecification, generalised Bayesian inference and approximate inference, I propose to advance the state of the art in computational statistics and probabilistic machine learning and introduce a methodological framework, algorithms, and associated computational and statistical theory, for performing robust inference over a universe of potentially misspecified models and their model parameters in large-scale settings. My research is motivated by impactful real-world applications of statistical machine learning in medical and social sciences.

Recent research focusing on model misspecification has been looking at how can we be robust to misspecified likelihoods, misspecified prior beliefs and data outliers, given a chosen model family, and some measure of fit or generalisation. This is primarily a post-hoc perspective that attempts to protect and robustify algorithms and inference against natural sources of misspecification. The project attempts to go beyond post-hoc corrections and formally embed into our algorithms and mathematical thinking any available information about the true data generating process.

For example, the project has looked at a robust inference method applicable to simulator-based models. Inference with such models is challenging as sampling is possible however the likelihood function is unavailable. Furthermore, simulator-based models often describe some complicated physical or biological phenomena and hence can be easily misspecified in practice. That is, they attempt to provide a rough approximation of a real-world phenomenon however the degree to which this approximation deviates from the true data-generating mechanism can lead to misleading inference outcomes. Model misspecification was recently examined in a series of papers suggesting the Bayesian Nonparametric Learning (NPL) framework which is based on the idea that uncertainty should be imposed directly on the data-generating mechanism rather than a parameter of interest, which is usually the case in traditional Bayesian methodology. The project's proposed method combines this framework with Maximum Mean Discrepancy estimators to provide a robust method suitable for likelihood-free inference. Such an approach provides a novel and computationally efficient method with theoretical guarantees.

A different type of model misspecification, widely studied in statistical inference, is measurement error in one of the independent variables. This problem, also called, errors-in-variables or input uncertainty problem arises often in economics, medical and natural sciences in which it is often hard or impossible to measure quantities in the real world exactly. Measurement error in the predictor variable can lead to biased parameter estimates and potentially misleading inference outcomes. For example, in many causal inference problems, the aim is to estimate the causal effect between two random variables, hence a biased estimate leads to false estimates of how the predictor variable affects the outcome variable. Such causal effect estimation problems are met in health and epidemiology sciences where scientists are interested in exposure-outcome relations. This project aims to explore this direction by adapting the NPL framework for Berkson and classical measurement error settings and empirically validating the proposed methodology to real-world applications in health and nutritional sciences.

Finally, the project explores model misspecification in the context of Distributionally Robust Optimisation (DRO). The goal is to make robust decisions with respect to a variable while accounting for likelihood misspecification through a worst-case analysis. The use of an NPL posterior in place of a standard Bayesian posterior in this setting can potentially result in less conservative decisions in comparison to traditional DRO methods while also accounting for model misspecification.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T51794X/1 01/10/2020 30/09/2025
2435792 Studentship EP/T51794X/1 05/10/2020 27/06/2024 Charita Dellaporta