Efficient bayesian inference and recycling of features in data analysis problems

Lead Research Organisation: University of Bath
Department Name: Mathematical Sciences

Abstract

The project is in collaboration with BT where large amounts of time-series data are collected along with vast amounts of potential predictors e.g. monthly telecommunications data with location and weather. With an increasing number of data generated and stored in many fields, applying statistical analysis to these data sets gives insight on the relationships between predictors and the response of interest. However, difficulties arise in situations where too many predictors exist as model fitting and prediction become challenging in the high dimensional setting with potentially many predictors being strongly correlated. Subset selection methods provide an attractive solution to ensure the statistical model produced is interpretable by reducing the number of predictors, however the difficulty is in identifying the relevant combination of predictors which best explain the response. Furthermore, evaluating all potential models as the number of predictors increases is computationally expensive and so requires methods which can greatly reduce this cost.

In addition, time-discretised data display seasonalities with complex non-linear relationships which can change over time. Therefore, it is crucial to keep track of the changes in predictors which potentially causes the changes in the relationship. This could be achieved using the Bayesian framework for parameter estimation which models uncertainty and prior beliefs about predictors as probability distributions. Additional information about predictors from observed data are then incorporated to update the distribution such that this reflects changes in parameter information. Traditional Markov Chain Monte Carlo methods used in Bayesian inference require target distributions to be known up to some proportionality however, such methods are infeasible for time dependent data due to the intractability of the likelihood function. Therefore, particle filtering methods offer numerical approximations for dynamic time series models with complex structures and intractable likelihoods.

The aim of this project is to incorporate subset selection methods and appropriate particle filter Bayesian methodology for dynamic time series models. Traditional statistical models are fit by hand which is time consuming and costly so automating the process of model fitting and predictor selection should be taken into consideration. The developed method would be applied in BT with the aims to improve insight and predictive ability in comparison to the current methods used in industry.

This project is relevant to EPSRC as we are developing and applying statistical methods which can improve the predictive ability and decision-making process in industrial settings. In addition, the developed method also aims to reduce the computational cost of the relevant methods that are currently used, and so falls within the mathematics and data science focus of EPSRC.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W522090/1 01/10/2021 30/09/2026
2671809 Studentship EP/W522090/1 28/03/2022 02/08/2026 Na Eun KIM