Streaming multi-core sample-based Bayesian Analysis

Lead Research Organisation: University of Liverpool
Department Name: Electrical Engineering and Electronics

Abstract

The aim of this PhD is to develop strategies for implementing state-of-the-art Bayesian techniques in ways that fully exploit the computational power present in today's and tomorrow's (increasingly multi-core) computers. This will build on previous research related to high performance computing, Big Data and Bayesian statistics.
The PhD involves active engagement with Schlumberger. Schlumberger is a major supplier of a broad range of services to the oil and gas sector and, in fact, the specific applications that motivate the research are associated with the analysis of data related to oil exploration. Drilling provides access to reservoirs from which oil and gas can be extracted. Reaching such reservoirs is increasingly difficult physically and economically. The modern solutions that Schlumberger is developing in response to these difficulties involve analysing high volume streams of data from poorly sampled systems with large degrees of uncertainty. There is a need to extract information and identify events in real-time.
Complex (finite element) models exist that can predict what data will be observed were the model's parameters to take specific values and can also predict how the model's parameters change over time. This makes it possible to apply a specific state-of-the-art technique, a particle filter, to the data stream (particle filters are reminiscent of genetic algorithms but were developed by statisticians to solve streaming analytics problems). Historic research has demonstrated the utility of this approach, but real-time operation remains challenging.
Multi-core architectures are increasingly widespread (eg in desktop PCs' CPUs, GPUs, Xeon Phis and super-computing clusters). Given this, the specific tactic to be used to obtain real-time performance is to parallelise the operation of the particle filter and distribute the processing across each of a number of cores.
Particle filters use the diversity of a population of samples to convey uncertainty. For the majority of the operation of the particle filter, each particle is processed independently. This makes it trivial to parallelise the majority of the particle filter. However, at a specific point in the filter, it becomes necessary to perform a "resampling" step. A text-book implementation of this resampling step is impossible to parallelise. However, previous research has demonstrated that it is possible to describe the resampling operation using a divide-and-conquer strategy. In so doing, it becomes possible to parallelise the resampling step. More recent work has identified that, if using more cores if to result in faster operation, it is crucial that data locality and pipelining are explicitly considered in the implementation. There is a need for this research to be extended significantly and the potential advantages motivate this PhD.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/P510567/1 01/10/2016 30/09/2021
1818639 Studentship EP/P510567/1 01/11/2016 28/01/2021 Alessandro Varsi
 
Description My PhD is focused on improving the performance in terms of speed and accuracy of Particle Filters and SMC Samplers, two algorithms which belong to the same class called SMC methods. These algorithms use the so-called Importance Sampling principle to make Bayesian inferences. In simple words, they can make real-time predictions about the state of dynamic and static systems. Because of that, the application domain is extremely vast: it includes applications on Object Tracking, Oil Exploration, Weather Forecasting, Stock Trading, Currency Trading, Medical Analysis and any research field in which it is important to collect data and make future predictions afterwards.

In order to improve the speed of these algorithms (and any other algorithm as well), we need to seek a parallelisable implementation of the same which exploits the computational power of modern computers and supercomputers, i.e. an implementation which can run in parallel on multi-core CPUs. A parallelisable implementation was only theorized before my PhD started but its benefits were not proven in practice yet. In my first year, I implemented the same algorithm on a parallel distributed memory architecture using MPI and proved that SMC methods are actually parallelisable. This finding was worth my first publication (the link is attached in the URL section) which was presented at the IET 3rd International Conference on Intelligent Signal Processing (ISP 2017) in December 2017. The title of the same is "Parallelising particle filters with deterministic run-time on distributed memory systems".

In my second year, I focused on improving my algorithm even further. The key idea was to work on the bottleneck of my algorithm in order to achieve an overall improvement. The optimised implementation is now up to 3 times faster than the state-of-the-art implementation, and I am now ready to focus on the final goal of my PhD, whose explanation follows.

Alternatives to SMC methods already exist, and they were invented in the 50s. This class of algorithms is called Markov Chain Monte Carlo (MCMC) methods. Although very popular and extensively used, this class of algorithms is proven to be non-parallelisable and therefore, not suitable to achieve high performance in terms of speed and accuracy. Since in my first year I proved in practice that SMC methods are parallisable, I am focusing now on proving that Parallel SMC methods can be an advantageous alternative to MCMC methods. For the comparison I am using the optimised algorithm which I implemented in the second year. The results have confirmed my hypotheses and I have already written a journal paper which I am going to submit this week to IEEE Transaction on Signal Processing to explain these final findings.
Exploitation Route My research has the potential to change the way people make "a posteriori" data analysis which, as I said above, is a routine and vital practice in many research fields, Industry, Medicine and Economy.
Sectors Digital/Communication/Information Technologies (including Software),Electronics,Financial Services, and Management Consultancy,Pharmaceuticals and Medical Biotechnology

URL https://ieeexplore.ieee.org/document/8361519