Sequential Monte Carlo in Random Environments

Lead Research Organisation: University of Bristol
Department Name: Mathematics

Abstract

Statistical analysis of data sequences using complex, non-linear stochastic models is now practically feasible due to the availability of sequential Monte Carlo algorithms. These algorithms allow us to tackle the computational problems which arise when attempting to draw conclusions, inform decisions and make predictions on the basis of data sequences gathered from the world around us. However, despite their remarkable popularity, there is currently no precise, rigorous and general notion of long-run efficiency of these algorithms in the context of model calibration and comparison tasks, so there is no way to formally compare the performance of existing algorithms as the length of the data sequences grow, nor any clear way in which to devise new algorithms which are guaranteed to perform well in this regime. The increasing availability of long data records and recent uptake of sequential Monte Carlo in a variety of burgeoning scientific areas provides strong and immediate motivation for investigation of these matters.

The objectives of the proposed research are to (A) develop a new theoretical and methodological framework in which to address notions of long-run efficiency of sequential Monte Carlo algorithms; and thereby (B) devise new algorithms which are guaranteed to remain practically useful as data sequences grow in length. These objectives are to be achieved through investigation of the subtle interplay between aspects of non-linear estimation, long-run data properties and stochastic simulation techniques in the probabilistic setting of a random environment. The ultimate purpose of the research is equip statistical scientists with a powerful suite of computational techniques with which to face the challenges of modern data analysis.

Planned Impact

It is envisaged that short term impacts of the research will be in academic areas where statistical methods are developed and applied. Longer term, societal and economic impacts then potentially arise from scientific developments in application areas.

Firstly, in computational statistics, the successful delivery of work package (A) will provide a new framework in which to understand the theoretical efficiency of SMC algorithms and thus open new avenues of research into their analysis, in turn potentially leading to developments in algorithmic methodology which may impact broadly across the practice of statistics. Secondly, in application areas such as systems biology, macroeconomics and geosciences, SMC methods are receiving increasing attention, and here the new algorithms arising from work package (B) will have potential for immediate application. In particular the new algorithms will be relevant to analysis of chemical kinetics and spread of infectious diseases, potentially impacting researchers in the UK Healthcare and Life Sciences theme via the sub-theme of Furthering Biomedical Understanding. The algorithms will also be relevant to analysis of certain dynamic macroeconomic models, potentially impacting the key challenge area of Developing New Economic Models within the Digital Economy research theme. Lastly new SMC algorithms may aid calibration of complex non-linear models in the geosciences, impacting forecasting and understanding of change in the atmosphere and oceans.
 
Description This project developed important new insights into the long-run stability properties of sequential Monte Carlo methods, the underlying mathematical framework of which involves nonnegative integral operators in random environments, and their application to practical problems arising for example in the fitting of nonlinear stochastic volatility models to financial time series and the design of efficient distributed algorithms to take advantage of high-performance computing architectures.

In terms of new theoretical developments, the research developed a framework in which to study sufficient conditions on the interaction structure of sequential Monte Carlo algorithms to guarantee their time-uniform convergence. In particular, theorems were developed which both gave rigorous justification for existing but previously only heuristically motivated algorithms; shed light on the efficiencies and inefficiencies of recently proposed algorithms from the engineering literature, and put forward practical guidelines informed by theory for the design of algorithms using distributed computers. These findings have been published in the journal Bernoulli and the Annals of Statistics, with two other pre-prints currently under review.

In terms of methodological developments, the research developed a new class of "Forest Resampling" methods which put into practice rigorous guarantees on the stability of algorithms developed through the new theory. This work was published in the journal Statistical Analysis and Data Mining.

In terms of practice and applications, the algorithms have been demonstrated and applied to a variety of statistical models. The first application was to the fitting of stochastic volatility models for financial time series. The second application was to the calibration of statistical models for localization and tracking of moving objects using Bluetooth Received Signal Strength measurements. In both these contexts, the new algorithms were found to outperform existing methods in terms of accuracy for a given computational budget. These findings are in a journal submission currently under review.

An unexpected but significant line of investigation which opened up during the project was an investigation of perfect sampling methods for hidden Markov models. In studying the stability properties of sequential Monte Carlo algorithms, the PI noticed previously unexplored connections to between the ergodic properties of conditional process in hidden Markov models and exact simulation techniques for nonhomogeneous Markov chains. This lead to a publication in the Annals of Applied Probability.

Regarding partnerships, through the PDRA appointed through the grant, Kari Heine, links were made to a group of mathematicians and telecommunication engineers at the Tampere University of Technology, Finland, where Heine had previously been a PhD student. This lead to a joint project with a PhD student, Juha Ala-Luhtala from Tampere, who visited the University of Bristol for three months and worked on the RSS localization application

Following this initial visit, Ala-Luhtala applied for further funding from the Emil Aaltonen foundation and KAUTE foundation, Finland, which lead to a further three-month visit to explore further avenues of research. This collaboration is on-going.
Exploitation Route To date, impact of the research has been academic in nature. The new theoretical framework of nonnegative operators in random environments has been extended and applied by other researchers to explore the scaling properties of sequential Monte Carlo methods. An example of this is the paper "A lognormal central limit theorem for particle approximations of normalizing constants" by Berard, Del Moral and Doucet, published in the Electronic Journal of Probability.

Unexpectedly, the insight developed through the project into the interaction structure of sequential Monte Carlo algorithms has attracted interested from researchers in the Computer Science community working on Probabilistic Programming, for instance arXiv preprints 1404.0099, 1403.0504. In particular, guidelines proposed regarding the bias of some parallel Monte Carlo estimators have been followed and incorporated into new Probabilistic Programming techniques. It is envisaged that further work by Computer Scientists in this area could lead to wider impact through the same guidelines becoming more widespread and recognized as "standard practice".
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Environment,Financial Services, and Management Consultancy,Pharmaceuticals and Medical Biotechnology