📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Efficient Spectral Algorithms for Massive and Dynamic Graphs

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Spectral graph theory investigates the algebraic properties of matrices associated with graphs. Over the past 20 years, studies in spectral graph theory have successfully overcome fundamental bottlenecks faced by combinatorial algorithms, and have become a major research focus in theoretical computer science, machine learning, and network analysis. In particular, some recent breakthrough results show that spectral techniques can be applied to solve central optimisation and learning problems in nearly-linear time. Designing such highly efficient algorithms is crucial to cope with the emergence of massive graphs and real-time data sets coming from technological, social and biological networks.

In this fellowship I propose three research directions to advance the studies of spectral algorithms and their applications in data science: (1) I propose to advance our understanding of fundamental spectral techniques by studying the spectral properties of different matrices representing directed graphs, and exploring new connections between graphs and other mathematical objects, e.g., manifolds studied in geometry; (2) I propose to investigate new spectral algorithms for two fundamental graph problems in different settings, and further improve the state-of-the-art of the two problems with respect to the algorithms' runtime and performance; (3) spectral algorithms vastly outperform combinatorial algorithms with respect to their runtime in the worst case, however the design of most spectral algorithms usually involve many procedures, which bring the issue of numerical stability for efficient implementations. To address this I propose to develop an open-source algorithmic library for spectral algorithms so that by the end of the fellowship data scientists would be able to use the state-of-the-art algorithms for spectral sparsification and graph clustering in a black box manner.

A successful completion of the fellowship will make a significant step towards understanding the power and limits of the algebraic techniques in designing fast graph algorithms in various settings, and the performance of nearly-linear time spectral algorithms in real world data sets.

Planned Impact

The overall fellowship is to investigate foundational, algorithmic, and applied aspects of spectral methods. By the nature of the proposed work and the tight relationships among the four work packages in the proposal, the fellowship will build connections between a number of active research areas in computer science and mathematics, and the fellowship's outcome would ideally spread to all these fields.

Because of the cross-disciplinary nature of the fellowship, academic researchers in the disciplines directly involved, including algorithms, combinatorial optimisation, machine learning, and numerical linear algebra, will appreciate the technical advances of the fellowship that addresses not only the theoretical foundations but also practical applications of the spectral algorithms for the two fundamental graph problems. The fellowship will further investigate the algorithmic Kadison-Singer problem, and the relationships between graphs and manifolds, whose research outcome will have a significant impact in geometry.

During the fellowship there will be a series of pathways to impact events, including two workshops in algorithmic spectral graph theory, master course on Spectral Algorithms and Their Applications in Data Science to be held at the Alan Turing Institute, as well as the involvement of the Turing Challenge Fortnights. These events provide an ideal platform to not only report the fellowship's progress with the other experts in the area and exchange ideas with researchers in related fields, but also introduce this frontier research field to other early-career researchers and more postgraduate students in the UK.

Finally, the fellowship will produce the first open-source algorithmic library of the state-of-the-art algorithms for spectral sparsification and graph clustering. The algorithmic library's performance with respect to its runtime and numerical stability on real-world data sets will be of great value to researchers in algorithms, machine learning, and numerical linear algebra. In the long term, a large number of end users of spectral algorithms will be also greatly benefited from the algorithmic library.

Publications

10 25 50
publication icon
Laenen S. (2020) Higher-order spectral clustering of directed graphs in Advances in Neural Information Processing Systems

publication icon
Cucuringu M. (2020) Hermitian matrices for clustering directed graphs: insights and applications in Proceedings of Machine Learning Research

publication icon
Macgregor P. (2021) Local Algorithms for Finding Densely Connected Clusters in Proceedings of Machine Learning Research

publication icon
Macgregor P. (2021) Finding Bipartite Components in Hypergraphs in Advances in Neural Information Processing Systems

publication icon
Manghiuc B.-A. (2021) Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs in Advances in Neural Information Processing Systems

publication icon
Laenen S. (2023) Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs in Proceedings of Machine Learning Research

 
Description (1) We have studied directed graph clustering and its applications for learning higher-order structures of clusters. In particular, we demonstrated that these algorithms can be used to uncover many significant patterns of clusters, and the use of our designed algorithms in analysing international trade and migration datasets.

(2) We have designed a number of efficient algorithms for graph clustering, and our algorithms' runtime is almost linear in the size of the input graphs. We further gave a theoretical analysis on the performance of our designed algorithms.

(3) We have made a significant progress on Weaver's Discrepancy Problem. For the design of efficient algorithms, we show that this problem can be solved in polynomial-time in certain regimes. On its computational complexity, we show that this problem is NP-hard for certain choices of parameters.

(4) We have published the first C++ based open-source algorithmic libraries for advanced spectral graph algorithms. Our developed library allows an end user to easily use the state-of-the-art algorithm in our research community as a black box.
Exploitation Route - The others interested in analysing higher-order information of graphs could use our designed algorithms directly.

- Our newly developed algorithms for hierarchical clustering can be used by others to better analyse the hierarchical structure of the input set occurring in practice.

- Our newly developed algorithm for finding dense components in hypergraphs could potentially be used in other research areas of computer science, including network analysis, and natural language processing.

- Our developed open-source library allows an end user to apply the state-of-the-art algorithms developed in algorithmic spectral graph theory as a black box, and use it in their daily data analysis.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://homepages.inf.ed.ac.uk/hsun4/project.html
 
Description A sequence of our work show that our designed algorithms can be used to analyse various datasets outside computer science. In addition, our work appearing at ICML'21 shows that some structures between the clusters can be uncovered with local algorithms. Our published STAG, which is the first C++ based open-source algorithmic library for spectral graph algorithms, allows an end user to directly apply several state-of-the art algorithms in spectral graph theory in their daily data analysis.
First Year Of Impact 2023
Sector Other
Impact Types Economic

Policy & public services

 
Description Collaboration on Data Citation 
Organisation University of Padova
Country Italy 
Sector Academic/University 
PI Contribution I worked with Peter Buneman, Dennis Dosso, and Matteo Lissandrini on efficient ways to measure the impact of datasets, and its computational complexity.
Collaborator Contribution In disseminating scientific and statistical data, on-line databases have almost completely replaced traditional paper-based media such as journals and reference works. Given this, can we measure the impact of a database in the same way that we measure an author's or journal's impact? To do this, we need somehow to represent a database as a set of publications, and databases typically allow a large number of possible decompositions into parts, any of which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies, so that if a database admits some kind of hierarchical interpretation we can use this as one measure of the importance of a database; moreover, this can be computed as efficiently as one can compute the normal h-index. This also gives us a decomposition of the database that might be used for other purposes such as giving credit to the curators or contributors to the database. We illustrate the process by analyzing three widely used databases.
Impact Can we measure the impact of a database? To appear in the Communications of the ACM
Start Year 2023
 
Description Collaboration on Data Citation 
Organisation University of Verona
Country Italy 
Sector Academic/University 
PI Contribution I worked with Peter Buneman, Dennis Dosso, and Matteo Lissandrini on efficient ways to measure the impact of datasets, and its computational complexity.
Collaborator Contribution In disseminating scientific and statistical data, on-line databases have almost completely replaced traditional paper-based media such as journals and reference works. Given this, can we measure the impact of a database in the same way that we measure an author's or journal's impact? To do this, we need somehow to represent a database as a set of publications, and databases typically allow a large number of possible decompositions into parts, any of which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies, so that if a database admits some kind of hierarchical interpretation we can use this as one measure of the importance of a database; moreover, this can be computed as efficiently as one can compute the normal h-index. This also gives us a decomposition of the database that might be used for other purposes such as giving credit to the curators or contributors to the database. We illustrate the process by analyzing three widely used databases.
Impact Can we measure the impact of a database? To appear in the Communications of the ACM
Start Year 2023
 
Description Dynamic Graph Sparsification 
Organisation ETH Zurich
Country Switzerland 
Sector Academic/University 
PI Contribution I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year 2020
 
Description Dynamic Graph Sparsification 
Organisation Rutgers University
Country United States 
Sector Academic/University 
PI Contribution I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year 2020
 
Description Dynamic Graph Sparsification 
Organisation Stanford University
Country United States 
Sector Academic/University 
PI Contribution I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year 2020
 
Description Dynamic Graph Sparsification 
Organisation University of California, Berkeley
Country United States 
Sector Academic/University 
PI Contribution I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year 2020
 
Description Dynamic Graph Sparsification 
Organisation University of Copenhagen
Country Denmark 
Sector Academic/University 
PI Contribution I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year 2020
 
Description Dynamic Graph Sparsification 
Organisation University of Michigan
Country United States 
Sector Academic/University 
PI Contribution I initiate the project during a research visit, and contribute some of the key ideas for the collaboration.
Collaborator Contribution Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary.
Impact Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Start Year 2020
 
Description Open vs. Closed Random Walks 
Organisation University of Cambridge
Department Computer Laboratory
Country United Kingdom 
Sector Academic/University 
PI Contribution We answers several open questions on the open versus closed random walks raised in the research community.
Collaborator Contribution A closed random walk of length l on an undirected and connected graph G = (V, E) is a random walk that returns to the start vertex at step l, and its properties have been recently related to problems in different mathematical fields, e.g., geometry and combinatorics (Jiang et al., Annals of Mathematics '21) and spectral graph theory (McKenzie et al., STOC '21). For instance, in the context of analyzing the eigenvalue multiplicity of graph matrices, McKenzie et al. show that, with high probability, the support of a closed random walk of length l ? 1 is ?(l^{1/5}) on any bounded-degree graph, and leaves as an open problem whether a stronger bound of ?(l^{1/2}) holds for any regular graph. First, we show that the support of a closed random walk of length l is at least ?(l^{1/2}/vlog n) for any regular or bounded-degree graph on n vertices. Secondly, we prove for every l ? 1 the existence of a family of bounded-degree graphs, together with a start vertex such that the support is bounded by O(l^{1/2}/vlog n). Besides addressing the open problem of McKenzie et al., these two results also establish a subtle separation between closed random walks and open random walks, for which the support on any regular (or bounded-degree) graph is well-known to be ?(l^{1/2}) for all l ? 1. For irregular graphs, we prove that even if the start vertex is chosen uniformly, the support of a closed random walk may still be O(log l). This rules out a general polynomial lower bound in l for all graphs. Finally, we apply our results on random walks to obtain new bounds on the multiplicity of the second largest eigenvalue of the adjacency matrices of graphs.
Impact The Support of Open versus Closed Random Walks. with Sauerwald T., and Vagnozzi D. (ICALP'23).
Start Year 2022
 
Title STAG: Spectral Toolkit of Algorithms for Graphs 
Description STAG is an open-source library of efficient spectral algorithms for graphs. It is the first such algorithmic library mainly written in C++, with a python wrapper around the underlying C++ library for python users. The source code of STAG can be found from GitHub page. 
Type Of Technology Webtool/Application 
Year Produced 2023 
Open Source License? Yes  
Impact We have so far implemented the local graph clustering component. Comparing with existing open-source implementation for local graph clustering, STAG connects to a Neo4j database running in the AuraDB cloud service, or a database running locally. With this feature, one can run local graph clustering on the entire Wikipedia graph of more than 20GB with a single PC. It is the first open-source implementation of local graph clustering that connects a graph database. 
URL https://staglibrary.io