Streaming multi-core sample-based Bayesian Analysis
Lead Research Organisation:
University of Liverpool
Department Name: Electrical Engineering and Electronics
Abstract
The aim of this PhD is to develop strategies for implementing state-of-the-art Bayesian techniques in ways that fully exploit the computational power present in today's and tomorrow's (increasingly multi-core) computers. This will build on previous research related to high performance computing, Big Data and Bayesian statistics.
The PhD involves active engagement with Schlumberger. Schlumberger is a major supplier of a broad range of services to the oil and gas sector and, in fact, the specific applications that motivate the research are associated with the analysis of data related to oil exploration. Drilling provides access to reservoirs from which oil and gas can be extracted. Reaching such reservoirs is increasingly difficult physically and economically. The modern solutions that Schlumberger is developing in response to these difficulties involve analysing high volume streams of data from poorly sampled systems with large degrees of uncertainty. There is a need to extract information and identify events in real-time.
Complex (finite element) models exist that can predict what data will be observed were the model's parameters to take specific values and can also predict how the model's parameters change over time. This makes it possible to apply a specific state-of-the-art technique, a particle filter, to the data stream (particle filters are reminiscent of genetic algorithms but were developed by statisticians to solve streaming analytics problems). Historic research has demonstrated the utility of this approach, but real-time operation remains challenging.
Multi-core architectures are increasingly widespread (eg in desktop PCs' CPUs, GPUs, Xeon Phis and super-computing clusters). Given this, the specific tactic to be used to obtain real-time performance is to parallelise the operation of the particle filter and distribute the processing across each of a number of cores.
Particle filters use the diversity of a population of samples to convey uncertainty. For the majority of the operation of the particle filter, each particle is processed independently. This makes it trivial to parallelise the majority of the particle filter. However, at a specific point in the filter, it becomes necessary to perform a "resampling" step. A text-book implementation of this resampling step is impossible to parallelise. However, previous research has demonstrated that it is possible to describe the resampling operation using a divide-and-conquer strategy. In so doing, it becomes possible to parallelise the resampling step. More recent work has identified that, if using more cores if to result in faster operation, it is crucial that data locality and pipelining are explicitly considered in the implementation. There is a need for this research to be extended significantly and the potential advantages motivate this PhD.
The PhD involves active engagement with Schlumberger. Schlumberger is a major supplier of a broad range of services to the oil and gas sector and, in fact, the specific applications that motivate the research are associated with the analysis of data related to oil exploration. Drilling provides access to reservoirs from which oil and gas can be extracted. Reaching such reservoirs is increasingly difficult physically and economically. The modern solutions that Schlumberger is developing in response to these difficulties involve analysing high volume streams of data from poorly sampled systems with large degrees of uncertainty. There is a need to extract information and identify events in real-time.
Complex (finite element) models exist that can predict what data will be observed were the model's parameters to take specific values and can also predict how the model's parameters change over time. This makes it possible to apply a specific state-of-the-art technique, a particle filter, to the data stream (particle filters are reminiscent of genetic algorithms but were developed by statisticians to solve streaming analytics problems). Historic research has demonstrated the utility of this approach, but real-time operation remains challenging.
Multi-core architectures are increasingly widespread (eg in desktop PCs' CPUs, GPUs, Xeon Phis and super-computing clusters). Given this, the specific tactic to be used to obtain real-time performance is to parallelise the operation of the particle filter and distribute the processing across each of a number of cores.
Particle filters use the diversity of a population of samples to convey uncertainty. For the majority of the operation of the particle filter, each particle is processed independently. This makes it trivial to parallelise the majority of the particle filter. However, at a specific point in the filter, it becomes necessary to perform a "resampling" step. A text-book implementation of this resampling step is impossible to parallelise. However, previous research has demonstrated that it is possible to describe the resampling operation using a divide-and-conquer strategy. In so doing, it becomes possible to parallelise the resampling step. More recent work has identified that, if using more cores if to result in faster operation, it is crucial that data locality and pipelining are explicitly considered in the implementation. There is a need for this research to be extended significantly and the potential advantages motivate this PhD.
People |
ORCID iD |
Jeyan Thiyagalingam (Primary Supervisor) | |
Alessandro Varsi (Student) |
Publications
Varsi A
(2021)
An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures
in Algorithms
Varsi A
(2020)
A Fast Parallel Particle Filter for Shared Memory Systems
in IEEE Signal Processing Letters
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/P510567/1 | 30/09/2016 | 29/09/2021 | |||
1818639 | Studentship | EP/P510567/1 | 01/11/2016 | 28/01/2021 | Alessandro Varsi |
Description | My PhD is focused on improving the performance in terms of speed and accuracy of Particle Filters and SMC Samplers, two algorithms which belong to the same class called SMC methods. These algorithms use the so-called Importance Sampling principle to make Bayesian inferences. In simple words, they can make real-time predictions about the state of dynamic and static systems. Because of that, the application domain is extremely vast: it includes applications on Object Tracking, Oil Exploration, Weather Forecasting, Stock Trading, Currency Trading, Medical Analysis and any research field in which it is important to collect data and make predictions afterwards. In order to improve the speed of these algorithms (and any other algorithm as well), we need to seek for a parallelisable implementation of the same which exploits the computational power of modern computers and supercomputers, i.e. an implementation which can run in parallel on multi-core CPUs. A parallelisable implementation was only theorized before my PhD started but its benefits were not proven in practice yet. In my first year, I implemented the same algorithm on a parallel distributed memory architecture using MPI and proved that SMC methods are actually parallelisable. This finding was worth my first publication (the link is attached in the URL section) which was presented at the IET 3rd International Conference on Intelligent Signal Processing (ISP 2017) in December 2017. The title of the same is "Parallelising particle filters with deterministic run-time on distributed memory systems". In my second year, I focused on improving my algorithm even further. The key idea was to work on the bottleneck of my algorithm in order to achieve an overall improvement. The optimised implementation is now up to 3 times faster than the state-of-the-art implementation. In my third year, I have written a paper on my second year results and the draft is at the final proofreading stage and going to be submitted to Signal Processing Letters (SPL) within two weeks. Also in my third year, I have managed to redesign my algorithm by using two key optimisations that have managed to speed up the performance by a further 10x factor with respect to the results of my second year. The results are so encouraging that we have decided to file a patent about it, which is currently on writing stage. |
Exploitation Route | My research has the potential to change the way people make "a posteriori" data analysis which, as I said above, is a routine and vital practice in many research fields, Industry, Medicine and Economy. |
Sectors | Digital/Communication/Information Technologies (including Software) Electronics Financial Services and Management Consultancy Pharmaceuticals and Medical Biotechnology |
URL | https://ieeexplore.ieee.org/document/8361519 |
Description | Co-development of Stan |
Organisation | Stan |
Sector | Charity/Non Profit |
PI Contribution | We are actively contributing to the Stan's code base. |
Collaborator Contribution | Access to a route to impact. |
Impact | So far, we have just injected a small change to the Stan maths library, but that is now in the latest release and so used by 100,000+ researchers. |
Start Year | 2018 |
Description | Joint Study Agreement with IBM |
Organisation | IBM |
Country | United States |
Sector | Private |
PI Contribution | We are developing next-generation data science techniques that can support both internal activity within IBM and their interactions with the customers. |
Collaborator Contribution | IBM are providing people, access to large computers and, for example, secondment opportunities. |
Impact | None as yet. |
Start Year | 2018 |