Near Real-Time Update Streaming for Distributed Dynamic Graphs

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

Background
Modern demand for analytical information on vast volumes of data has seen a resurgence in research surrounding 'dynamic graph' processing. This is due to the ability to explore the variations in data over time, alongside complex problem abstractions. Much of this research, however, is concerned with graph analysis after creation; there is little focus on how to efficiently insert new information into a graph. An issue magnified when attempting to work within a distributed environment.
Project overview and objectives
This project focuses exactly on the issue above, looking at effective ways of updating distributed in-memory graphs and removing the need for recreation when new data becomes available. This is an important issue to address, as the continual ingestion of old data greatly degrades system performance and increases costs through additional CPU time/electricity usage.
Following this milestone, the next objective is to investigate a manner of streaming data into distributed graphs and updating monitored metrics in a near real-time manner. As well as greatly decreasing the computational cost of many algorithms, swapping from batch to stream processing would, more importantly, lower update latency, allowing changes/anomalies to be acted upon almost as soon as they occur.
Finally, as a limited number of graph frameworks are distributed (those that are focusing on batch processing) there will be several peripheral questions to be tackled. For example: what are the most appropriate storage methods for distributed dynamic graphs? what is the best way to maintain their partitions as they mutate and degrade over time? can the transformation of algorithms for batch to stream be automated or assisted in some manner?
Potential Impact
Graph processing is employed in essential use cases across a variety of industries; from obvious social network analysis and page ranking, to more obscure cases such as studying Cancer genomics. The possible cost reductions could, therefore, have a high impact within industrial applications, alongside new implementations for time dependent use cases such as fraud detection on bank transactions.
Additionally, as there is a deficiency of prior work in this area, the project has the potential to discover new avenues of inquest, laying the foundations for substantial future research and improvements.
Alignment to EPSRC's strategies and research areas
Being at its core a Distributed Processing framework, this research clearly fits within the boundaries of 'ICT Networks & Distributed Systems', one of the largest areas of interest for the EPSRC.
Companies or collaborators involved
This work is to be completed in collaboration with my advisor Dr. Felix Cuadrado and Hewlett Packard Enterprises, the CASE Sponsor for my PhD.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N50953X/1 01/10/2016 30/09/2021
1803769 Studentship EP/N50953X/1 01/11/2016 30/04/2020 Benjamin Steer
 
Description Defined the model for a temporal property graph, the update semantics for said model (the addition, removal and mutation of vertices and edges)
A practical manner of ingesting streams of graph updates into temporal graph maintained over a distributed set of machines.
A method for performing analysis over distributed temporal graphs. 1: Allowing users to investigate how the model/underlying data has changed over time. 2: allowing users to include time as a first order citizen within analytics such as in temporal shortest path.
A framework for benchmarking streaming graph analytics systems, merging components from pure stream processing benchmarks and traditional static graph benchmarks.
Exploitation Route The current work has focused much on social network analysis and there is still a lot of space to explore in this domain i.e. niche community detection and mutation (extremist groups on twitter). However, there is also much future work exploring other areas such as transport networks and financial ecosystems.
Sectors Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Financial Services, and Management Consultancy,Transport

 
Title Raphtory: A Distributed System for Temporal Graph Analysis 
Description Raphtory is a distributed system that can create in real time a distributed temporal graph from multiple data sources, and perform continuous graph analysis on the managed graph. The software can run standalone on a single machine, or deployed as a set of Docker images in a cluster (AWS and Microsoft Azure have been tested). The released software started as the outcome of a PhD studentship but it has evolved into a collaborative open source project with multiple contributors, 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The Raphtory software project has gained significant interest from researchers and industry. There are four different companies who are evaluating Raphtory for adoption on some of their internal use cases, as well as active research collaborations with four additional institutions. Currently 45 developers have starred the project, and 9 other members who have provided contributions to it. 
URL https://raphtory.github.io/