Efficient Spectral Algorithms for Massive and Dynamic Graphs

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Informatics

Abstract

Spectral graph theory investigates the algebraic properties of matrices associated with graphs. Over the past 20 years, studies in spectral graph theory have successfully overcome fundamental bottlenecks faced by combinatorial algorithms, and have become a major research focus in theoretical computer science, machine learning, and network analysis. In particular, some recent breakthrough results show that spectral techniques can be applied to solve central optimisation and learning problems in nearly-linear time. Designing such highly efficient algorithms is crucial to cope with the emergence of massive graphs and real-time data sets coming from technological, social and biological networks.

In this fellowship I propose three research directions to advance the studies of spectral algorithms and their applications in data science: (1) I propose to advance our understanding of fundamental spectral techniques by studying the spectral properties of different matrices representing directed graphs, and exploring new connections between graphs and other mathematical objects, e.g., manifolds studied in geometry; (2) I propose to investigate new spectral algorithms for two fundamental graph problems in different settings, and further improve the state-of-the-art of the two problems with respect to the algorithms' runtime and performance; (3) spectral algorithms vastly outperform combinatorial algorithms with respect to their runtime in the worst case, however the design of most spectral algorithms usually involve many procedures, which bring the issue of numerical stability for efficient implementations. To address this I propose to develop an open-source algorithmic library for spectral algorithms so that by the end of the fellowship data scientists would be able to use the state-of-the-art algorithms for spectral sparsification and graph clustering in a black box manner.

A successful completion of the fellowship will make a significant step towards understanding the power and limits of the algebraic techniques in designing fast graph algorithms in various settings, and the performance of nearly-linear time spectral algorithms in real world data sets.

Planned Impact

The overall fellowship is to investigate foundational, algorithmic, and applied aspects of spectral methods. By the nature of the proposed work and the tight relationships among the four work packages in the proposal, the fellowship will build connections between a number of active research areas in computer science and mathematics, and the fellowship's outcome would ideally spread to all these fields.

Because of the cross-disciplinary nature of the fellowship, academic researchers in the disciplines directly involved, including algorithms, combinatorial optimisation, machine learning, and numerical linear algebra, will appreciate the technical advances of the fellowship that addresses not only the theoretical foundations but also practical applications of the spectral algorithms for the two fundamental graph problems. The fellowship will further investigate the algorithmic Kadison-Singer problem, and the relationships between graphs and manifolds, whose research outcome will have a significant impact in geometry.

During the fellowship there will be a series of pathways to impact events, including two workshops in algorithmic spectral graph theory, master course on Spectral Algorithms and Their Applications in Data Science to be held at the Alan Turing Institute, as well as the involvement of the Turing Challenge Fortnights. These events provide an ideal platform to not only report the fellowship's progress with the other experts in the area and exchange ideas with researchers in related fields, but also introduce this frontier research field to other early-career researchers and more postgraduate students in the UK.

Finally, the fellowship will produce the first open-source algorithmic library of the state-of-the-art algorithms for spectral sparsification and graph clustering. The algorithmic library's performance with respect to its runtime and numerical stability on real-world data sets will be of great value to researchers in algorithms, machine learning, and numerical linear algebra. In the long term, a large number of end users of spectral algorithms will be also greatly benefited from the algorithmic library.

Funded Value:

£1,205,706

Funded Period:

Jan 20 - May 25

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/T00729X/1

Principal Investigator:

He Sun

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (100%)

Organisations

People	ORCID iD
He Sun (Principal Investigator / Fellow)

Publications

Author Name Title Publication Date Published

|< < 1 2 > >|

10 25 50

Laenen S. (2020) Higher-order spectral clustering of directed graphs in Advances in Neural Information Processing Systems

Mihai Cucuringu (2020) Hermitian matrices for clustering directed graphs: insights and applications

Manghiuc B (2020) Augmenting the Algebraic Connectivity of Graphs

Cucuringu M. (2020) Hermitian matrices for clustering directed graphs: insights and applications in Proceedings of Machine Learning Research

Macgregor P. (2021) Local Algorithms for Finding Densely Connected Clusters in Proceedings of Machine Learning Research

Macgregor P. (2021) Finding Bipartite Components in Hypergraphs in Advances in Neural Information Processing Systems

Manghiuc B.-A. (2021) Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs in Advances in Neural Information Processing Systems

Macgregor P (2022) A Tighter Analysis of Spectral Clustering, and Beyond

Jourdan B (2022) Is the Algorithmic Kadison-Singer Problem Hard?

Macgregor P (2022) A Tighter Analysis of Spectral Clustering, and Beyond

Bernstein A (2022) Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary

Macgregor P (2022) Finding Bipartite Components in Hypergraphs

Jourdan B (2023) Is the Algorithmic Kadison-Singer Problem Hard?

Laenen S (2023) Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs

Macgregor P (2023) Spectral Toolkit of Algorithms for Graphs: Technical Report (1)

Laenen S (2023) Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs

Sun H (2023) Three Hardness Results for Graph Similarity Problems

Macgregor P (2023) Fast Approximation of Similarity Graphs with Kernel Density Estimation

Laenen S. (2023) Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs in Proceedings of Machine Learning Research

Macgregor P (2023) Fast Approximation of Similarity Graphs with Kernel Density Estimation

Jourdan B (2023) Is the Algorithmic Kadison-Singer Problem Hard?

Sauerwald T (2023) The Support of Open Versus Closed Random Walks

Macgregor P (2023) Fast and Simple Spectral Clustering in Theory and Practice

Laenen S (2024) Dynamic Spectral Clustering with Provable Approximation Guarantee

Key Findings
Impact Summary
Collaboration
Software and Technical Products


Description	(1) We have studied directed graph clustering and its applications for learning higher-order structures of clusters. In particular, we demonstrated that these algorithms can be used to uncover many significant patterns of clusters, and the use of our designed algorithms in analysing international trade and migration datasets. (2) We have designed a number of efficient algorithms for graph clustering, and our algorithms' runtime is almost linear in the size of the input graphs. We further gave a theoretical analysis on the performance of our designed algorithms. (3) We have made a significant progress on Weaver's Discrepancy Problem. For the design of efficient algorithms, we show that this problem can be solved in polynomial-time in certain regimes. On its computational complexity, we show that this problem is NP-hard for certain choices of parameters. (4) We have published the first C++ based open-source algorithmic libraries for advanced spectral graph algorithms. Our developed library allows an end user to easily use the state-of-the-art algorithm in our research community as a black box.
Exploitation Route	- The others interested in analysing higher-order information of graphs could use our designed algorithms directly. - Our newly developed algorithms for hierarchical clustering can be used by others to better analyse the hierarchical structure of the input set occurring in practice. - Our newly developed algorithm for finding dense components in hypergraphs could potentially be used in other research areas of computer science, including network analysis, and natural language processing. - Our developed open-source library allows an end user to apply the state-of-the-art algorithms developed in algorithmic spectral graph theory as a black box, and use it in their daily data analysis.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://homepages.inf.ed.ac.uk/hsun4/project.html


Description	A sequence of our work show that our designed algorithms can be used to analyse various datasets outside computer science. In addition, our work appearing at ICML'21 shows that some structures between the clusters can be uncovered with local algorithms. Our published STAG, which is the first C++ based open-source algorithmic library for spectral graph algorithms, allows an end user to directly apply several state-of-the art algorithms in spectral graph theory in their daily data analysis.
First Year Of Impact	2023
Sector	Other
Impact Types	Economic Policy & public services


Description	Collaboration on Data Citation
Organisation	University of Padova
Country	Italy
Sector	Academic/University
PI Contribution	I worked with Peter Buneman, Dennis Dosso, and Matteo Lissandrini on efficient ways to measure the impact of datasets, and its computational complexity.
Collaborator Contribution	In disseminating scientific and statistical data, on-line databases have almost completely replaced traditional paper-based media such as journals and reference works. Given this, can we measure the impact of a database in the same way that we measure an author's or journal's impact? To do this, we need somehow to represent a database as a set of publications, and databases typically allow a large number of possible decompositions into parts, any of which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies, so that if a database admits some kind of hierarchical interpretation we can use this as one measure of the importance of a database; moreover, this can be computed as efficiently as one can compute the normal h-index. This also gives us a decomposition of the database that might be used for other purposes such as giving credit to the curators or contributors to the database. We illustrate the process by analyzing three widely used databases.
Impact	Can we measure the impact of a database? To appear in the Communications of the ACM
Start Year	2023


Description	Collaboration on Data Citation
Organisation	University of Verona
Country	Italy
Sector	Academic/University
PI Contribution	I worked with Peter Buneman, Dennis Dosso, and Matteo Lissandrini on efficient ways to measure the impact of datasets, and its computational complexity.
Collaborator Contribution	In disseminating scientific and statistical data, on-line databases have almost completely replaced traditional paper-based media such as journals and reference works. Given this, can we measure the impact of a database in the same way that we measure an author's or journal's impact? To do this, we need somehow to represent a database as a set of publications, and databases typically allow a large number of possible decompositions into parts, any of which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies, so that if a database admits some kind of hierarchical interpretation we can use this as one measure of the importance of a database; moreover, this can be computed as efficiently as one can compute the normal h-index. This also gives us a decomposition of the database that might be used for other purposes such as giving credit to the curators or contributors to the database. We illustrate the process by analyzing three widely used databases.
Impact	Can we measure the impact of a database? To appear in the Communications of the ACM
Start Year	2023


Description	Dynamic Graph Sparsification
Organisation	ETH Zurich
Country	Switzerland
Sector	Academic/University
PI Contribution	I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution	Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact	Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year	2020


Description	Dynamic Graph Sparsification
Organisation	Rutgers University
Country	United States
Sector	Academic/University
PI Contribution	I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution	Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact	Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year	2020


Description	Dynamic Graph Sparsification
Organisation	Stanford University
Country	United States
Sector	Academic/University
PI Contribution	I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution	Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact	Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year	2020


Description	Dynamic Graph Sparsification
Organisation	University of California, Berkeley
Country	United States
Sector	Academic/University
PI Contribution	I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution	Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact	Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year	2020


Description	Dynamic Graph Sparsification
Organisation	University of Copenhagen
Country	Denmark
Sector	Academic/University
PI Contribution	I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution	Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact	Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year	2020


Description	Dynamic Graph Sparsification
Organisation	University of Michigan
Country	United States
Sector	Academic/University
PI Contribution	I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution	Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact	Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year	2020


Description	Open vs. Closed Random Walks
Organisation	University of Cambridge
Department	Computer Laboratory
Country	United Kingdom
Sector	Academic/University
PI Contribution	We answers several open questions on the open versus closed random walks raised in the research community.
Collaborator Contribution	A closed random walk of length l on an undirected and connected graph G = (V, E) is a random walk that returns to the start vertex at step l, and its properties have been recently related to problems in different mathematical fields, e.g., geometry and combinatorics (Jiang et al., Annals of Mathematics '21) and spectral graph theory (McKenzie et al., STOC '21). For instance, in the context of analyzing the eigenvalue multiplicity of graph matrices, McKenzie et al. show that, with high probability, the support of a closed random walk of length l ? 1 is ?(l^{1/5}) on any bounded-degree graph, and leaves as an open problem whether a stronger bound of ?(l^{1/2}) holds for any regular graph. First, we show that the support of a closed random walk of length l is at least ?(l^{1/2}/vlog n) for any regular or bounded-degree graph on n vertices. Secondly, we prove for every l ? 1 the existence of a family of bounded-degree graphs, together with a start vertex such that the support is bounded by O(l^{1/2}/vlog n). Besides addressing the open problem of McKenzie et al., these two results also establish a subtle separation between closed random walks and open random walks, for which the support on any regular (or bounded-degree) graph is well-known to be ?(l^{1/2}) for all l ? 1. For irregular graphs, we prove that even if the start vertex is chosen uniformly, the support of a closed random walk may still be O(log l). This rules out a general polynomial lower bound in l for all graphs. Finally, we apply our results on random walks to obtain new bounds on the multiplicity of the second largest eigenvalue of the adjacency matrices of graphs.
Impact	The Support of Open versus Closed Random Walks. with Sauerwald T., and Vagnozzi D. (ICALP'23).
Start Year	2022


Title	STAG: Spectral Toolkit of Algorithms for Graphs
Description	STAG is an open-source library of efficient spectral algorithms for graphs. It is the first such algorithmic library mainly written in C++, with a python wrapper around the underlying C++ library for python users. The source code of STAG can be found from GitHub page.
Type Of Technology	Webtool/Application
Year Produced	2023
Open Source License?	Yes
Impact	We have so far implemented the local graph clustering component. Comparing with existing open-source implementation for local graph clustering, STAG connects to a Neo4j database running in the AuraDB cloud service, or a database running locally. With this feature, one can run local graph clustering on the entire Wikipedia graph of more than 20GB with a single PC. It is the first open-source implementation of local graph clustering that connects a graph database.
URL	https://staglibrary.io

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications