Near Real-Time Update Streaming for Distributed Dynamic Graphs

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

Background
Modern demand for analytical information on vast volumes of data has seen a resurgence in research surrounding 'dynamic graph' processing. This is due to the ability to explore the variations in data over time, alongside complex problem abstractions. Much of this research, however, is concerned with graph analysis after creation; there is little focus on how to efficiently insert new information into a graph. An issue magnified when attempting to work within a distributed environment.
Project overview and objectives
This project focuses exactly on the issue above, looking at effective ways of updating distributed in-memory graphs and removing the need for recreation when new data becomes available. This is an important issue to address, as the continual ingestion of old data greatly degrades system performance and increases costs through additional CPU time/electricity usage.
Following this milestone, the next objective is to investigate a manner of streaming data into distributed graphs and updating monitored metrics in a near real-time manner. As well as greatly decreasing the computational cost of many algorithms, swapping from batch to stream processing would, more importantly, lower update latency, allowing changes/anomalies to be acted upon almost as soon as they occur.
Finally, as a limited number of graph frameworks are distributed (those that are focusing on batch processing) there will be several peripheral questions to be tackled. For example: what are the most appropriate storage methods for distributed dynamic graphs? what is the best way to maintain their partitions as they mutate and degrade over time? can the transformation of algorithms for batch to stream be automated or assisted in some manner?
Potential Impact
Graph processing is employed in essential use cases across a variety of industries; from obvious social network analysis and page ranking, to more obscure cases such as studying Cancer genomics. The possible cost reductions could, therefore, have a high impact within industrial applications, alongside new implementations for time dependent use cases such as fraud detection on bank transactions.
Additionally, as there is a deficiency of prior work in this area, the project has the potential to discover new avenues of inquest, laying the foundations for substantial future research and improvements.
Alignment to EPSRC's strategies and research areas
Being at its core a Distributed Processing framework, this research clearly fits within the boundaries of 'ICT Networks & Distributed Systems', one of the largest areas of interest for the EPSRC.
Companies or collaborators involved
This work is to be completed in collaboration with my advisor Dr. Felix Cuadrado and Hewlett Packard Enterprises, the CASE Sponsor for my PhD.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N50953X/1 01/10/2016 30/09/2021
2123705 Studentship EP/N50953X/1 01/11/2016 30/04/2020 Benjamin Steer