Streaming multi-core sample-based Bayesian Analysis

Lead Research Organisation: University of Liverpool
Department Name: Electrical Engineering and Electronics

Abstract

The aim of this PhD is to develop strategies for implementing state-of-the-art Bayesian techniques in ways that fully exploit the computational power present in today's and tomorrow's (increasingly multi-core) computers. This will build on previous research related to high performance computing, Big Data and Bayesian statistics.
The PhD involves active engagement with Schlumberger. Schlumberger is a major supplier of a broad range of services to the oil and gas sector and, in fact, the specific applications that motivate the research are associated with the analysis of data related to oil exploration. Drilling provides access to reservoirs from which oil and gas can be extracted. Reaching such reservoirs is increasingly difficult physically and economically. The modern solutions that Schlumberger is developing in response to these difficulties involve analysing high volume streams of data from poorly sampled systems with large degrees of uncertainty. There is a need to extract information and identify events in real-time.
Complex (finite element) models exist that can predict what data will be observed were the model's parameters to take specific values and can also predict how the model's parameters change over time. This makes it possible to apply a specific state-of-the-art technique, a particle filter, to the data stream (particle filters are reminiscent of genetic algorithms but were developed by statisticians to solve streaming analytics problems). Historic research has demonstrated the utility of this approach, but real-time operation remains challenging.
Multi-core architectures are increasingly widespread (eg in desktop PCs' CPUs, GPUs, Xeon Phis and super-computing clusters). Given this, the specific tactic to be used to obtain real-time performance is to parallelise the operation of the particle filter and distribute the processing across each of a number of cores.
Particle filters use the diversity of a population of samples to convey uncertainty. For the majority of the operation of the particle filter, each particle is processed independently. This makes it trivial to parallelise the majority of the particle filter. However, at a specific point in the filter, it becomes necessary to perform a "resampling" step. A text-book implementation of this resampling step is impossible to parallelise. However, previous research has demonstrated that it is possible to describe the resampling operation using a divide-and-conquer strategy. In so doing, it becomes possible to parallelise the resampling step. More recent work has identified that, if using more cores if to result in faster operation, it is crucial that data locality and pipelining are explicitly considered in the implementation. There is a need for this research to be extended significantly and the potential advantages motivate this PhD.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/P510567/1 01/10/2016 30/09/2021
1818639 Studentship EP/P510567/1 01/11/2016 28/01/2021 Alessandro Varsi
 
Description My PhD is focused on improving the performance in terms of speed and accuracy of Particle Filters and SMC Samplers, two algorithms which belong to the same class called SMC methods. These algorithms use the so-called Importance Sampling principle to make Bayesian inferences. In simple words, they can make real-time predictions about the state of dynamic and static systems. Because of that, the application domain is extremely vast: it includes applications on Object Tracking, Oil Exploration, Weather Forecasting, Stock Trading, Currency Trading, Medical Analysis and any research field in which it is important to collect data and make predictions afterwards.

In order to improve the speed of these algorithms (and any other algorithm as well), we need to seek for a parallelisable implementation of the same which exploits the computational power of modern computers and supercomputers, i.e. an implementation which can run in parallel on multi-core CPUs. A parallelisable implementation was only theorized before my PhD started but its benefits were not proven in practice yet.

In my first year, I implemented the same algorithm on a parallel distributed memory architecture using MPI and proved that SMC methods are actually parallelisable. This finding was worth my first publication (the link is attached in the URL section) which was presented at the IET 3rd International Conference on Intelligent Signal Processing (ISP 2017) in December 2017. The title of the same is "Parallelising particle filters with deterministic run-time on distributed memory systems".

In my second year, I focused on improving my algorithm even further. The key idea was to work on the bottleneck of my algorithm in order to achieve an overall improvement. The optimised implementation is now up to 3 times faster than the state-of-the-art implementation.

In my third year, I have written a paper on my second year results and the draft is at the final proofreading stage and going to be submitted to Signal Processing Letters (SPL) within two weeks. Also in my third year, I have managed to redesign my algorithm by using two key optimisations that have managed to speed up the performance by a further 10x factor with respect to the results of my second year. The results are so encouraging that we have decided to file a patent about it, which is currently on writing stage.
Exploitation Route My research has the potential to change the way people make "a posteriori" data analysis which, as I said above, is a routine and vital practice in many research fields, Industry, Medicine and Economy.
Sectors Digital/Communication/Information Technologies (including Software),Electronics,Financial Services, and Management Consultancy,Pharmaceuticals and Medical Biotechnology

URL https://ieeexplore.ieee.org/document/8361519
 
Description Co-development of Stan 
Organisation Stan
Sector Charity/Non Profit 
PI Contribution We are actively contributing to the Stan's code base.
Collaborator Contribution Access to a route to impact.
Impact So far, we have just injected a small change to the Stan maths library, but that is now in the latest release and so used by 100,000+ researchers.
Start Year 2018
 
Description Joint Study Agreement with IBM 
Organisation IBM
Country United States 
Sector Private 
PI Contribution We are developing next-generation data science techniques that can support both internal activity within IBM and their interactions with the customers.
Collaborator Contribution IBM are providing people, access to large computers and, for example, secondment opportunities.
Impact None as yet.
Start Year 2018