StreamDG: Streaming Processing of Massive Dynamic Graphs
Lead Research Organisation:
University of Bristol
Department Name: Computer Science
Abstract
Massive data sets play a central role in most areas of modern life. IT giants such as Amazon evaluate large amounts of customer data in order to provide instant buying recommendations, Facebook maintains a social network of billions of monthly active users, and Google collects data from billions of webpages every day in order to answer more than 70,000 search queries every second. Areas such as healthcare, education, and finance are heavily reliant on Big Data technology.
However, extracting useful information from massive data sets is a challenging task. Since modern massive data sets far exceed the RAM (random access memory) of modern computers, algorithms for processing such data sets are never able to "see" the input in its entirety at any one moment. This is a fundamental restriction that is similar to how we humans process the world around us: Our senses provide us with an almost unlimited amount of information as we walk through life, however, our brains are merely able to store a small summary of our past and most information is lost.
Data streaming algorithms are massive data set algorithms that mimic this behaviour: A data streaming algorithm scans a massive input data set once only and maintains a summary in the computer's RAM that is much smaller than the size of the input. The objective is to maintain summaries that are as small as possible but still represent the input data well enough in order to solve a specific task. This poses many interesting questions that have been the subject of research for multiple decades: Which problems allow the design of small-space streaming algorithms? Are there problems that cannot be solved with small space, and if this is the case, can these problems at least be solved approximately?
This project addresses streaming algorithms for processing massive graphs. Graphs are central objects in computer science and have countless applications. They allow us to describe relations (called edges) between entities (called nodes). For example, a social network graph consists of friendship relations (edges) between pairs of users (nodes). Most research on data streaming algorithms for processing massive graphs assumes that graphs are static objects that never change in structure or size. However, this assumption can rarely be guaranteed in practice. For example, social network graphs change both in structure and size when users join or leave the network or when new friendships are established and existing ones are ended. This observation yields the central questions of this project: Processing dynamic graphs is at least as hard as processing static graphs, but how much harder is it? By how much do summaries have to increase in size?
The latter question was first addressed in a seminal paper in 2012. To the surprise of many data streaming researchers, it was shown that for many important problems, the summaries required for the dynamic setting only need to be marginally larger than those for the static setting, while a substantial increase in size was expected. Today, a multitude of problems are known that behave in a similar way while only very few problems are known that require substantially larger summaries in the dynamic setting.
The aim of this project is to shed light on the space requirements of streaming algorithms for processing massive dynamic graphs. While our current knowledge suggests that most problems do not become substantially harder in the dynamic setting, we believe that this picture is somewhat skewed and that a multitude of key problems are in fact much harder to solve in the dynamic case. To confirm our conjecture, we will design new streaming algorithms and prove impossibility results that show this to be the case. Where streaming dynamic graph processing provably requires too much space to be practically feasible, we will provide alternative models that allow for the design of streaming algorithms with small space.
However, extracting useful information from massive data sets is a challenging task. Since modern massive data sets far exceed the RAM (random access memory) of modern computers, algorithms for processing such data sets are never able to "see" the input in its entirety at any one moment. This is a fundamental restriction that is similar to how we humans process the world around us: Our senses provide us with an almost unlimited amount of information as we walk through life, however, our brains are merely able to store a small summary of our past and most information is lost.
Data streaming algorithms are massive data set algorithms that mimic this behaviour: A data streaming algorithm scans a massive input data set once only and maintains a summary in the computer's RAM that is much smaller than the size of the input. The objective is to maintain summaries that are as small as possible but still represent the input data well enough in order to solve a specific task. This poses many interesting questions that have been the subject of research for multiple decades: Which problems allow the design of small-space streaming algorithms? Are there problems that cannot be solved with small space, and if this is the case, can these problems at least be solved approximately?
This project addresses streaming algorithms for processing massive graphs. Graphs are central objects in computer science and have countless applications. They allow us to describe relations (called edges) between entities (called nodes). For example, a social network graph consists of friendship relations (edges) between pairs of users (nodes). Most research on data streaming algorithms for processing massive graphs assumes that graphs are static objects that never change in structure or size. However, this assumption can rarely be guaranteed in practice. For example, social network graphs change both in structure and size when users join or leave the network or when new friendships are established and existing ones are ended. This observation yields the central questions of this project: Processing dynamic graphs is at least as hard as processing static graphs, but how much harder is it? By how much do summaries have to increase in size?
The latter question was first addressed in a seminal paper in 2012. To the surprise of many data streaming researchers, it was shown that for many important problems, the summaries required for the dynamic setting only need to be marginally larger than those for the static setting, while a substantial increase in size was expected. Today, a multitude of problems are known that behave in a similar way while only very few problems are known that require substantially larger summaries in the dynamic setting.
The aim of this project is to shed light on the space requirements of streaming algorithms for processing massive dynamic graphs. While our current knowledge suggests that most problems do not become substantially harder in the dynamic setting, we believe that this picture is somewhat skewed and that a multitude of key problems are in fact much harder to solve in the dynamic case. To confirm our conjecture, we will design new streaming algorithms and prove impossibility results that show this to be the case. Where streaming dynamic graph processing provably requires too much space to be practically feasible, we will provide alternative models that allow for the design of streaming algorithms with small space.
People |
ORCID iD |
| Christian Konrad (Principal Investigator) |
Publications
Alexandru C
(2022)
Improved Weighted Matching in the Sliding Window Model
Cezar-Mihail Alexandru
(2024)
Interval Selection in Sliding Windows
Christian Konrad
(2024)
Matchings in Low-Arboricity Graphs in the Dynamic Graph Stream Model
Khanna S
(2022)
Optimal Bounds for Dominating Set in Graph Streams
Sanjeev Khanna
(2025)
Streaming Maximal Matching with Bounded Deletions
| Description | The objective of this work was to advance our understanding of how edge deletions affect the space complexity of streaming algorithms for processing massive graphs. Beyond improving state-of-the-art results for fundamental graph problems in this setting, our key aim was to explore a bounded-deletion model as a remedy in situations when an unbounded number of deletions would otherwise necessitate prohibitively large memory usage. Our research has led to significant advancements, as demonstrated through nine publications associated with this project. Below, we highlight some key contributions: In [Assadi et al., SODA'25], we fully resolved the pass complexity of insertion-deletion semi-streaming algorithms for computing a constant-factor approximation to Maximum Matching, proving it to be Theta(log log n). This result exponentially improves upon both the previous best-known algorithm and lower bound. In [Assadi et al., STOC'24], we established a lower bound showing that insertion-deletion semi-streaming algorithms for computing a Maximal Independent Set require Omega(log log n) passes. This result confirms the optimality of the algorithm by [Ahn et al., ICML'15] and fully resolves the problem. The lower bound techniques in the two aforementioned works ([Assadi et al., SODA'25] and [Assadi et al., STOC'24]) were significantly inspired by our previous work [Konrad, Naidu, SODA'24]. In [Khanna et al., arXiv preprint'25], we examined the Maximal Matching and Maximum Matching problems in the bounded-deletion streaming model, where the number of edge deletions is limited by a parameter K. We precisely determined the space complexity of deterministic and randomized algorithms for computing a Maximal Matching and resolved the space complexity of algorithms for obtaining a constant-factor approximation to Maximum Matching. In [Alexandru, Konrad, ESA'24] and [Alexandru et al., STACS'23], we studied the streaming sliding window model, which constitutes a highly structured deletion model where edges are implicitly deleted after a fixed time. We improved the approximation factor for weighted matching from the previous best (3.5+e) to (3+e), matching the best-known results for the unweighted case. Additionally, we initiated the study of Interval Selection in this model, achieving optimal results for unit-length intervals and introducing a novel algorithm for arbitrary-length intervals that goes beyond what is achievable using established techniques. In [Khanna, Konrad, ITCS'22], we settled the space complexity of insertion-deletion streaming algorithms for the Minimum Dominating Set problem. We also established a separation between the insertion-deletion and the insertion-only settings, by giving an optimal and novel space-efficient insertion-only algorithm. Beyond these core contributions, we advanced the study of insertion-deletion streaming algorithms for estimating the size of a maximum matching in bounded-arboricity graphs [Konrad et al., FSTTCS'24] and provided impossibility results for Interval Selection in this setting [Dark et al., FSTTCS'23]. |
| Exploitation Route | Many of our works associated with this project introduce novel algorithmic techniques and ideas, many of which we expect to be built upon to solve related problems. One example where this has already happened is the hierarchical graph construction introduced in [Konrad, Naidu, SODA'24], which played a crucial role in establishing the lower bound constructions in [Assadi et al., STOC'24] and [Assadi et al., SODA'25]. We also anticipate significant future impact from our work on bounded-deletion graph streams [Khanna, Konrad, arXiv preprint'25]. This work is the first to develop algorithms with space complexity that is optimal as a function of the number of edge deletions in the stream, and we anticipate follow-up work that attempt to achieve similar results for other fundamental graph problems. Additionally, this research introduced the hierarchical maximal matching data structure, which is of independent interest and may find applications in other resource-constraint computational models. Our contributions to the sliding window model, particularly in [Alexandru, Konrad, ESA'24], advanced beyond the widely used smooth histogram technique for designing sliding window algorithms for graph problems. We hope to see further applications of our method in future research. |
| Sectors | Digital/Communication/Information Technologies (including Software) |
| Description | Bristol - University of Massachusetts |
| Organisation | University of Massachusetts Amherst |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Due to shared research interests, Dr Konrad visited Prof Andrew McGregor in 2023 for a research visit. This has led to an ongoing successful collaboration (weekly video calls, research work, writing and publishing of a paper). |
| Collaborator Contribution | Prof Andrew McGregor hosted me during a research visit. He and his team contributed with research work, the writing of a paper, and with giving a presentation of our paper at a conference. |
| Impact | [Konrad et al., FSTTCS'24] |
| Start Year | 2023 |
| Description | Bristol - University of Pennsylvania |
| Organisation | University of Pennsylvania |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Collaboration between the PI and Prof Sanjeev Khanna from the University of Pennsylvania. The PI contributed with new ideas, research work, the writing of joint papers, weekly video-calls, and conducted three one-week research visits at the University of Pennsylvania. |
| Collaborator Contribution | Prof Khanna contributed with new ideas, research work, the writing of the joint papers, and hosted the PI during research visits. |
| Impact | [Khanna, Konrad, ITCS'22], [Khanna, Konrad, Alexandru, PODS'23], [Khanna, Konrad, Dark, arXiv preprint '25] |
| Start Year | 2021 |
| Description | Bristol - University of Waterloo |
| Organisation | University of Waterloo |
| Country | Canada |
| Sector | Academic/University |
| PI Contribution | Based on our work [Konrad, Naidu, SODA'24], we have started collaborating with Sepehr Assadi at the University of Waterloo on related problems. This has led to the two publications [Assadi et al, STOC'24] and [Assadi et al., SODA'25], both inspired by ideas from [Konrad, Naidu, SODA'24]. We have contributed with research work and the writing of the paper. |
| Collaborator Contribution | Sepehr Assadi and his team contributed to this collaboration with research work and the writing of a research paper. |
| Impact | [Assadi et al, STOC 2024], [Assadi et al., SODA'25] |
| Start Year | 2023 |