Streaming multi-core sample-based Bayesian Analysis

Lead Research Organisation: University of Liverpool

Department Name: Electrical Engineering and Electronics

Abstract

The aim of this PhD is to develop strategies for implementing state-of-the-art Bayesian techniques in ways that fully exploit the computational power present in today's and tomorrow's (increasingly multi-core) computers. This will build on previous research related to high performance computing, Big Data and Bayesian statistics.
The PhD involves active engagement with Schlumberger. Schlumberger is a major supplier of a broad range of services to the oil and gas sector and, in fact, the specific applications that motivate the research are associated with the analysis of data related to oil exploration. Drilling provides access to reservoirs from which oil and gas can be extracted. Reaching such reservoirs is increasingly difficult physically and economically. The modern solutions that Schlumberger is developing in response to these difficulties involve analysing high volume streams of data from poorly sampled systems with large degrees of uncertainty. There is a need to extract information and identify events in real-time.
Complex (finite element) models exist that can predict what data will be observed were the model's parameters to take specific values and can also predict how the model's parameters change over time. This makes it possible to apply a specific state-of-the-art technique, a particle filter, to the data stream (particle filters are reminiscent of genetic algorithms but were developed by statisticians to solve streaming analytics problems). Historic research has demonstrated the utility of this approach, but real-time operation remains challenging.
Multi-core architectures are increasingly widespread (eg in desktop PCs' CPUs, GPUs, Xeon Phis and super-computing clusters). Given this, the specific tactic to be used to obtain real-time performance is to parallelise the operation of the particle filter and distribute the processing across each of a number of cores.
Particle filters use the diversity of a population of samples to convey uncertainty. For the majority of the operation of the particle filter, each particle is processed independently. This makes it trivial to parallelise the majority of the particle filter. However, at a specific point in the filter, it becomes necessary to perform a "resampling" step. A text-book implementation of this resampling step is impossible to parallelise. However, previous research has demonstrated that it is possible to describe the resampling operation using a divide-and-conquer strategy. In so doing, it becomes possible to parallelise the resampling step. More recent work has identified that, if using more cores if to result in faster operation, it is crucial that data locality and pipelining are explicitly considered in the implementation. There is a need for this research to be extended significantly and the potential advantages motivate this PhD.

Student:

Alessandro Varsi

Period of Study:

Nov 16 - Jan 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1818639

Research Topic:

Unclassified

Organisations

People	ORCID iD
Jeyan Thiyagalingam (Primary Supervisor)
Alessandro Varsi (Student)

Publications

Author Name Title

Publication Date Published

10 25 50

Varsi A (2020) A Fast Parallel Particle Filter for Shared Memory Systems in IEEE Signal Processing Letters

Varsi A (2021) An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures in Algorithms

Varsi A (2017) Parallelising Particle Filters with Deterministic Runtime on Distributed Memory Systems

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/P510567/1			01/10/2016	30/09/2021
1818639	Studentship	EP/P510567/1	01/11/2016	28/01/2021	Alessandro Varsi

Key Findings
Collaboration


Description	My PhD is focused on improving the performance in terms of speed and accuracy of Particle Filters and SMC Samplers, two algorithms which belong to the same class called SMC methods. These algorithms use the so-called Importance Sampling principle to make Bayesian inferences. In simple words, they can make real-time predictions about the state of dynamic and static systems. Because of that, the application domain is extremely vast: it includes applications on Object Tracking, Oil Exploration, Weather Forecasting, Stock Trading, Currency Trading, Medical Analysis and any research field in which it is important to collect data and make predictions afterwards. In order to improve the speed of these algorithms (and any other algorithm as well), we need to seek for a parallelisable implementation of the same which exploits the computational power of modern computers and supercomputers, i.e. an implementation which can run in parallel on multi-core CPUs. A parallelisable implementation was only theorized before my PhD started but its benefits were not proven in practice yet. In my first year, I implemented the same algorithm on a parallel distributed memory architecture using MPI and proved that SMC methods are actually parallelisable. This finding was worth my first publication (the link is attached in the URL section) which was presented at the IET 3rd International Conference on Intelligent Signal Processing (ISP 2017) in December 2017. The title of the same is "Parallelising particle filters with deterministic run-time on distributed memory systems". In my second year, I focused on improving my algorithm even further. The key idea was to work on the bottleneck of my algorithm in order to achieve an overall improvement. The optimised implementation is now up to 3 times faster than the state-of-the-art implementation. In my third year, I have written a paper on my second year results and the draft is at the final proofreading stage and going to be submitted to Signal Processing Letters (SPL) within two weeks. Also in my third year, I have managed to redesign my algorithm by using two key optimisations that have managed to speed up the performance by a further 10x factor with respect to the results of my second year. The results are so encouraging that we have decided to file a patent about it, which is currently on writing stage.
Exploitation Route	My research has the potential to change the way people make "a posteriori" data analysis which, as I said above, is a routine and vital practice in many research fields, Industry, Medicine and Economy.
Sectors	Digital/Communication/Information Technologies (including Software),Electronics,Financial Services, and Management Consultancy,Pharmaceuticals and Medical Biotechnology
URL	https://ieeexplore.ieee.org/document/8361519


Description	Co-development of Stan
Organisation	Stan
Sector	Charity/Non Profit
PI Contribution	We are actively contributing to the Stan's code base.
Collaborator Contribution	Access to a route to impact.
Impact	So far, we have just injected a small change to the Stan maths library, but that is now in the latest release and so used by 100,000+ researchers.
Start Year	2018


Description	Joint Study Agreement with IBM
Organisation	IBM
Country	United States
Sector	Private
PI Contribution	We are developing next-generation data science techniques that can support both internal activity within IBM and their interactions with the customers.
Collaborator Contribution	IBM are providing people, access to large computers and, for example, secondment opportunities.
Impact	None as yet.
Start Year	2018

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects