How local particle filters can be used to solve filtering and smoothing problems in Hidden Markov Models

Lead Research Organisation: University of Bristol
Department Name: Mathematics

Abstract

Abstract
(no more than 4,000 characters inc spaces) Hidden Markov Models (HMM) can be applied widely in real life, from traffic models to different biological and neural models. This fitting with reality has increased the demand of precise and efficient algorithms to solve the filtering and the smoothing problems. The literature offers different methods to work out the filtering and the smoothing distribution of the underlying model, though when the dimension increases two problems are revealed. Firstly, as outspread in the paper by Rebeschini, Van Handel et al. (Can local particle filters beat the curse of dimensionality?, 2015), the particle filter (a well-know algorithm that estimates the filtering distribution) is affected by the curse of dimensionality. Secondly, even if the quantities of interest can be calculated in a close form the growth in the dimension could make the computational cost unfeasible. As explained in detail by Rebeschini, Van Handel et al. (2015), it is possible to prevent this problem by developing local particle filters with a dimension-free error, which has also the advantage of being computationally cheaper.

The starting point of the research is the HMM where the hidden Markov chain is an n-dimensional sequence of 0's and 1's. Given that the state space is the product space of {0, 1} (n-times) , it can be easily recognized that the dimension is 2 to the power of n, meaning that the curse of dimensionality can be an issue. Since the state space is finite, the filtering and the smoothing distributions are available after forward and a backward step. However these operations involve handling matrices with 2 to the n rows and 2 to the n columns that are too expensive from a computational point of view. A possible work around to this is to introduce an error to increase the speed of the algorithm. This approximation can be found by assuming the existence of a local structure inside the model and considering a particular factorization of the observation distribution. The first thing is assuming that the Markov process X, for a fixed time, admits a local structure, meaning that each component is somehow caused by only a set of neighbors. The second assumption is that the process Y for a fixed time is drawn from a distribution G that admits a nondegenerative representation, meaning that exists a positive observation density g which factorizes. The last assumption is the factorization of the initial measure. These assumptions allow to redefine the forward step as working only on matrices with reduced dimension and with a low approximation error. Having ensured that the corresponding implementation works properly, the next aim will be to modify the EM algorithm to also include the cases in which all the parameters of the model (initial distribution, transition kernel, etc.) are unknown. This specific case has an application to traffic models. Indeed, in the temporal evolution of a network of roads each edge can be modeled by a component with a value that can be either congested or not congested. Of course this state of a road is not achievable, therefore only the number of cars for each road is observed. In this specific case the local structure is a reasonable assumption, because the traffic in a road is influenced only by the closest ones. The next step would be to apply the algorithm to this freeway traffic model, using data from the real world.

As natural continuation of the research, it will be interesting to study generalization of this algorithm. So trying to apply the approximation not only to the "ideal" finite state space, but also to more general spaces. If that is possible, the method can be adapted to a huge range of applications.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509619/1 01/10/2016 30/09/2021
1961576 Studentship EP/N509619/1 18/09/2017 31/03/2021 Lorenzo Rimella
 
Description My research is mainly focused on high-dimensional Hidden Markov models (HMM), extremely flexible models which are popular across the Machine Learning community. An HMM is a statistical model which explains patterns in observed data in terms of the evolution over time of a process which is not observed directly.

For example, suppose we are going every day, including weekends, to the British Library and we are considering to use the underground station at King's Cross. We do not know if the underground lines passing through King's Cross are going to be crowded or not, but we observe the number of people entering/exiting King's Cross over time. In this case, the hidden process is the congestion state of the lines (crowded/quiet) and the observed process is the flow of people. The high-dimensional case can be thought of as the case of multiple tube stations and multiple lines.

Given an HMM, the "smoothing" problem is to calculate conditional probabilities over states of the hidden process given all the observations. Unfortunately, in high-dimensional cases, the smoothing problem is too computationally demanding to solve exactly, motivating the need for efficient and accurate approximations.

The first product of my research is a new algorithm called the "Graph Filter-Smoother" which solves the smoothing problem approximately by dividing the high-dimensional problem to multiple lower dimensional ones that can be solved more easily. I have applied the Graph Filter-Smoother to an HMM modelling data from London Underground Oyster Cards, illustrating how forecasts of the flow of people entering and exiting underground stations can be obtained.

The second product of my research is a new class of HMM called "Hidden Markov Neural Networks" (HMNN), which combines HMM with Neural networks (NN). This allows getting an NN that adapt sequentially to the data, meaning that the algorithm can learn from a dataset that is split into multiple sets or simply fits a temporal structure in the data. The procedure was applied to two different scenarios: online learning from the dataset MNIST; prediction of the next frame in a video of a waving flag.
Exploitation Route The findings of my research can be taken forward in multiple ways.

The Graph Filter-Smoother can be applied to different scenarios. In particular, "smart cities" problems can be explored more accurately by developing more sophisticated models which can better predict congestion based on London's tube data. Moreover, other forms of traffic data can be investigated, such as traffic levels on road networks.
Beyond traffic applications, another interesting field can be weather forecasting from atmosphere imaging data, which have a natural spatial structure to which the Graph Filter-Smoother is suited. Remark that an image is made of hundreds of thousands of pixels and so it is naturally high-dimensional, that is why an alternative to the classic approach is needed.
Similarly, HMNN could be used in other sequential application, such as financial time series and natural language processing. At the same time, from a theoretical point of I did not focus on the quality of the approximation used in HMNN, and it might be interesting for the community to know how the error accumulates.

My whole research is available on arXiv and all my codes are open source and available on my Github page (https://github.com/LorenzoRimella) [HMNN's paper was submitted to ICML2020 and so not available at the moment to preserve anonymity]. This means that I am sharing completely my research with the community and anyone could potentially use my findings to take them to the next level.
Sectors Communities and Social Services/Policy,Financial Services, and Management Consultancy,Government, Democracy and Justice,Transport

URL https://arxiv.org/abs/1902.01639
 
Title Graph Filter-Smoother 
Description Graph Filter and Graph Smoother are algorithms for approximate filtering and smoothing in high-dimensional factorial hidden Markov models. The approximation involves discarding, in a principled way, likelihood factors according to a notion of locality in a factor graph associated with the emission distribution. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact It is a low-cost approximation for filtering and smoothing algorithms for HMM. The low-cost allows running the algorithm also when the number of dimensions makes classical approaches unfeasible. 
URL http://github.com/LorenzoRimella/GraphFilter-GraphSmoother