StreamDG: Streaming Processing of Massive Dynamic Graphs

Lead Research Organisation: University of Bristol
Department Name: Computer Science

Abstract

Massive data sets play a central role in most areas of modern life. IT giants such as Amazon evaluate large amounts of customer data in order to provide instant buying recommendations, Facebook maintains a social network of billions of monthly active users, and Google collects data from billions of webpages every day in order to answer more than 70,000 search queries every second. Areas such as healthcare, education, and finance are heavily reliant on Big Data technology.

However, extracting useful information from massive data sets is a challenging task. Since modern massive data sets far exceed the RAM (random access memory) of modern computers, algorithms for processing such data sets are never able to "see" the input in its entirety at any one moment. This is a fundamental restriction that is similar to how we humans process the world around us: Our senses provide us with an almost unlimited amount of information as we walk through life, however, our brains are merely able to store a small summary of our past and most information is lost.

Data streaming algorithms are massive data set algorithms that mimic this behaviour: A data streaming algorithm scans a massive input data set once only and maintains a summary in the computer's RAM that is much smaller than the size of the input. The objective is to maintain summaries that are as small as possible but still represent the input data well enough in order to solve a specific task. This poses many interesting questions that have been the subject of research for multiple decades: Which problems allow the design of small-space streaming algorithms? Are there problems that cannot be solved with small space, and if this is the case, can these problems at least be solved approximately?

This project addresses streaming algorithms for processing massive graphs. Graphs are central objects in computer science and have countless applications. They allow us to describe relations (called edges) between entities (called nodes). For example, a social network graph consists of friendship relations (edges) between pairs of users (nodes). Most research on data streaming algorithms for processing massive graphs assumes that graphs are static objects that never change in structure or size. However, this assumption can rarely be guaranteed in practice. For example, social network graphs change both in structure and size when users join or leave the network or when new friendships are established and existing ones are ended. This observation yields the central questions of this project: Processing dynamic graphs is at least as hard as processing static graphs, but how much harder is it? By how much do summaries have to increase in size?

The latter question was first addressed in a seminal paper in 2012. To the surprise of many data streaming researchers, it was shown that for many important problems, the summaries required for the dynamic setting only need to be marginally larger than those for the static setting, while a substantial increase in size was expected. Today, a multitude of problems are known that behave in a similar way while only very few problems are known that require substantially larger summaries in the dynamic setting.

The aim of this project is to shed light on the space requirements of streaming algorithms for processing massive dynamic graphs. While our current knowledge suggests that most problems do not become substantially harder in the dynamic setting, we believe that this picture is somewhat skewed and that a multitude of key problems are in fact much harder to solve in the dynamic case. To confirm our conjecture, we will design new streaming algorithms and prove impossibility results that show this to be the case. Where streaming dynamic graph processing provably requires too much space to be practically feasible, we will provide alternative models that allow for the design of streaming algorithms with small space.

Publications

10 25 50
 
Description Bristol - University of Pennsylvania 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution Collaboration between the PI and Prof Sanjeev Khanna from the University of Pennsylvania. The PI contributed with new ideas, research work, the writing of joint papers, weekly video-calls, and conducted a research visit at the University of Pennsylvania.
Collaborator Contribution Prof Khanna contributed with new ideas, research work, the writing of the joint papers, and hosted the PI during a research visit.
Impact [Khanna, Konrad, ITCS'22], [Alexandru, Khanna, Konrad, PODS'23]
Start Year 2021