Efficient Spectral Algorithms for Massive and Dynamic Graphs
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Informatics
Abstract
Spectral graph theory investigates the algebraic properties of matrices associated with graphs. Over the past 20 years, studies in spectral graph theory have successfully overcome fundamental bottlenecks faced by combinatorial algorithms, and have become a major research focus in theoretical computer science, machine learning, and network analysis. In particular, some recent breakthrough results show that spectral techniques can be applied to solve central optimisation and learning problems in nearly-linear time. Designing such highly efficient algorithms is crucial to cope with the emergence of massive graphs and real-time data sets coming from technological, social and biological networks.
In this fellowship I propose three research directions to advance the studies of spectral algorithms and their applications in data science: (1) I propose to advance our understanding of fundamental spectral techniques by studying the spectral properties of different matrices representing directed graphs, and exploring new connections between graphs and other mathematical objects, e.g., manifolds studied in geometry; (2) I propose to investigate new spectral algorithms for two fundamental graph problems in different settings, and further improve the state-of-the-art of the two problems with respect to the algorithms' runtime and performance; (3) spectral algorithms vastly outperform combinatorial algorithms with respect to their runtime in the worst case, however the design of most spectral algorithms usually involve many procedures, which bring the issue of numerical stability for efficient implementations. To address this I propose to develop an open-source algorithmic library for spectral algorithms so that by the end of the fellowship data scientists would be able to use the state-of-the-art algorithms for spectral sparsification and graph clustering in a black box manner.
A successful completion of the fellowship will make a significant step towards understanding the power and limits of the algebraic techniques in designing fast graph algorithms in various settings, and the performance of nearly-linear time spectral algorithms in real world data sets.
In this fellowship I propose three research directions to advance the studies of spectral algorithms and their applications in data science: (1) I propose to advance our understanding of fundamental spectral techniques by studying the spectral properties of different matrices representing directed graphs, and exploring new connections between graphs and other mathematical objects, e.g., manifolds studied in geometry; (2) I propose to investigate new spectral algorithms for two fundamental graph problems in different settings, and further improve the state-of-the-art of the two problems with respect to the algorithms' runtime and performance; (3) spectral algorithms vastly outperform combinatorial algorithms with respect to their runtime in the worst case, however the design of most spectral algorithms usually involve many procedures, which bring the issue of numerical stability for efficient implementations. To address this I propose to develop an open-source algorithmic library for spectral algorithms so that by the end of the fellowship data scientists would be able to use the state-of-the-art algorithms for spectral sparsification and graph clustering in a black box manner.
A successful completion of the fellowship will make a significant step towards understanding the power and limits of the algebraic techniques in designing fast graph algorithms in various settings, and the performance of nearly-linear time spectral algorithms in real world data sets.
Planned Impact
The overall fellowship is to investigate foundational, algorithmic, and applied aspects of spectral methods. By the nature of the proposed work and the tight relationships among the four work packages in the proposal, the fellowship will build connections between a number of active research areas in computer science and mathematics, and the fellowship's outcome would ideally spread to all these fields.
Because of the cross-disciplinary nature of the fellowship, academic researchers in the disciplines directly involved, including algorithms, combinatorial optimisation, machine learning, and numerical linear algebra, will appreciate the technical advances of the fellowship that addresses not only the theoretical foundations but also practical applications of the spectral algorithms for the two fundamental graph problems. The fellowship will further investigate the algorithmic Kadison-Singer problem, and the relationships between graphs and manifolds, whose research outcome will have a significant impact in geometry.
During the fellowship there will be a series of pathways to impact events, including two workshops in algorithmic spectral graph theory, master course on Spectral Algorithms and Their Applications in Data Science to be held at the Alan Turing Institute, as well as the involvement of the Turing Challenge Fortnights. These events provide an ideal platform to not only report the fellowship's progress with the other experts in the area and exchange ideas with researchers in related fields, but also introduce this frontier research field to other early-career researchers and more postgraduate students in the UK.
Finally, the fellowship will produce the first open-source algorithmic library of the state-of-the-art algorithms for spectral sparsification and graph clustering. The algorithmic library's performance with respect to its runtime and numerical stability on real-world data sets will be of great value to researchers in algorithms, machine learning, and numerical linear algebra. In the long term, a large number of end users of spectral algorithms will be also greatly benefited from the algorithmic library.
Because of the cross-disciplinary nature of the fellowship, academic researchers in the disciplines directly involved, including algorithms, combinatorial optimisation, machine learning, and numerical linear algebra, will appreciate the technical advances of the fellowship that addresses not only the theoretical foundations but also practical applications of the spectral algorithms for the two fundamental graph problems. The fellowship will further investigate the algorithmic Kadison-Singer problem, and the relationships between graphs and manifolds, whose research outcome will have a significant impact in geometry.
During the fellowship there will be a series of pathways to impact events, including two workshops in algorithmic spectral graph theory, master course on Spectral Algorithms and Their Applications in Data Science to be held at the Alan Turing Institute, as well as the involvement of the Turing Challenge Fortnights. These events provide an ideal platform to not only report the fellowship's progress with the other experts in the area and exchange ideas with researchers in related fields, but also introduce this frontier research field to other early-career researchers and more postgraduate students in the UK.
Finally, the fellowship will produce the first open-source algorithmic library of the state-of-the-art algorithms for spectral sparsification and graph clustering. The algorithmic library's performance with respect to its runtime and numerical stability on real-world data sets will be of great value to researchers in algorithms, machine learning, and numerical linear algebra. In the long term, a large number of end users of spectral algorithms will be also greatly benefited from the algorithmic library.
Organisations
- University of Edinburgh (Lead Research Organisation)
- University of Cambridge (Collaboration)
- University of Copenhagen (Collaboration)
- University of Padova (Collaboration)
- University of Verona (Collaboration)
- University of California, Berkeley (Collaboration)
- ETH Zurich (Collaboration)
- Rutgers University (Collaboration)
- University of Michigan (Collaboration)
- Stanford University (Collaboration)
- Max Planck Institutes (Project Partner)
- University of Cambridge (Project Partner)
- The Alan Turing Institute (Project Partner)
- University of Washington (Project Partner)
People |
ORCID iD |
| He Sun (Principal Investigator / Fellow) |
Publications
Laenen S.
(2020)
Higher-order spectral clustering of directed graphs
in Advances in Neural Information Processing Systems
Mihai Cucuringu
(2020)
Hermitian matrices for clustering directed graphs: insights and applications
Manghiuc B
(2020)
Augmenting the Algebraic Connectivity of Graphs
Cucuringu M.
(2020)
Hermitian matrices for clustering directed graphs: insights and applications
in Proceedings of Machine Learning Research
Macgregor P.
(2021)
Local Algorithms for Finding Densely Connected Clusters
in Proceedings of Machine Learning Research
Macgregor P.
(2021)
Finding Bipartite Components in Hypergraphs
in Advances in Neural Information Processing Systems
Manghiuc B.-A.
(2021)
Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs
in Advances in Neural Information Processing Systems
Macgregor P
(2022)
A Tighter Analysis of Spectral Clustering, and Beyond
Jourdan B
(2022)
Is the Algorithmic Kadison-Singer Problem Hard?
Macgregor P
(2022)
A Tighter Analysis of Spectral Clustering, and Beyond
Bernstein A
(2022)
Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary
Macgregor P
(2022)
Finding Bipartite Components in Hypergraphs
Jourdan B
(2023)
Is the Algorithmic Kadison-Singer Problem Hard?
Laenen S
(2023)
Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
Macgregor P
(2023)
Spectral Toolkit of Algorithms for Graphs: Technical Report (1)
Laenen S
(2023)
Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
Sun H
(2023)
Three Hardness Results for Graph Similarity Problems
Macgregor P
(2023)
Fast Approximation of Similarity Graphs with Kernel Density Estimation
Laenen S.
(2023)
Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
in Proceedings of Machine Learning Research
Macgregor P
(2023)
Fast Approximation of Similarity Graphs with Kernel Density Estimation
Jourdan B
(2023)
Is the Algorithmic Kadison-Singer Problem Hard?
Sauerwald T
(2023)
The Support of Open Versus Closed Random Walks
Sauerwald T
(2023)
The Support of Open Versus Closed Random Walks
Macgregor P
(2023)
Fast and Simple Spectral Clustering in Theory and Practice
| Description | (1) We have studied directed graph clustering and its applications for learning higher-order structures of clusters. In particular, we demonstrated that these algorithms can be used to uncover many significant patterns of clusters, and the use of our designed algorithms in analysing international trade and migration datasets. (2) We have designed a number of efficient algorithms for graph clustering, and our algorithms' runtime is almost linear in the size of the input graphs. We further gave a theoretical analysis on the performance of our designed algorithms. (3) We have made a significant progress on Weaver's Discrepancy Problem. For the design of efficient algorithms, we show that this problem can be solved in polynomial-time in certain regimes. On its computational complexity, we show that this problem is NP-hard for certain choices of parameters. (4) We have published the first C++ based open-source algorithmic libraries for advanced spectral graph algorithms. Our developed library allows an end user to easily use the state-of-the-art algorithm in our research community as a black box. |
| Exploitation Route | - The others interested in analysing higher-order information of graphs could use our designed algorithms directly. - Our newly developed algorithms for hierarchical clustering can be used by others to better analyse the hierarchical structure of the input set occurring in practice. - Our newly developed algorithm for finding dense components in hypergraphs could potentially be used in other research areas of computer science, including network analysis, and natural language processing. - Our developed open-source library allows an end user to apply the state-of-the-art algorithms developed in algorithmic spectral graph theory as a black box, and use it in their daily data analysis. |
| Sectors | Digital/Communication/Information Technologies (including Software) |
| URL | https://homepages.inf.ed.ac.uk/hsun4/project.html |
| Description | A sequence of our work show that our designed algorithms can be used to analyse various datasets outside computer science. In addition, our work appearing at ICML'21 shows that some structures between the clusters can be uncovered with local algorithms. Our published STAG, which is the first C++ based open-source algorithmic library for spectral graph algorithms, allows an end user to directly apply several state-of-the art algorithms in spectral graph theory in their daily data analysis. |
| First Year Of Impact | 2023 |
| Sector | Other |
| Impact Types | Economic Policy & public services |
| Description | Collaboration on Data Citation |
| Organisation | University of Padova |
| Country | Italy |
| Sector | Academic/University |
| PI Contribution | I worked with Peter Buneman, Dennis Dosso, and Matteo Lissandrini on efficient ways to measure the impact of datasets, and its computational complexity. |
| Collaborator Contribution | In disseminating scientific and statistical data, on-line databases have almost completely replaced traditional paper-based media such as journals and reference works. Given this, can we measure the impact of a database in the same way that we measure an author's or journal's impact? To do this, we need somehow to represent a database as a set of publications, and databases typically allow a large number of possible decompositions into parts, any of which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies, so that if a database admits some kind of hierarchical interpretation we can use this as one measure of the importance of a database; moreover, this can be computed as efficiently as one can compute the normal h-index. This also gives us a decomposition of the database that might be used for other purposes such as giving credit to the curators or contributors to the database. We illustrate the process by analyzing three widely used databases. |
| Impact | Can we measure the impact of a database? To appear in the Communications of the ACM |
| Start Year | 2023 |
| Description | Collaboration on Data Citation |
| Organisation | University of Verona |
| Country | Italy |
| Sector | Academic/University |
| PI Contribution | I worked with Peter Buneman, Dennis Dosso, and Matteo Lissandrini on efficient ways to measure the impact of datasets, and its computational complexity. |
| Collaborator Contribution | In disseminating scientific and statistical data, on-line databases have almost completely replaced traditional paper-based media such as journals and reference works. Given this, can we measure the impact of a database in the same way that we measure an author's or journal's impact? To do this, we need somehow to represent a database as a set of publications, and databases typically allow a large number of possible decompositions into parts, any of which could be treated as a publication. We show that the definition of the h-index naturally extends to hierarchies, so that if a database admits some kind of hierarchical interpretation we can use this as one measure of the importance of a database; moreover, this can be computed as efficiently as one can compute the normal h-index. This also gives us a decomposition of the database that might be used for other purposes such as giving credit to the curators or contributors to the database. We illustrate the process by analyzing three widely used databases. |
| Impact | Can we measure the impact of a database? To appear in the Communications of the ACM |
| Start Year | 2023 |
| Description | Dynamic Graph Sparsification |
| Organisation | ETH Zurich |
| Country | Switzerland |
| Sector | Academic/University |
| PI Contribution | I initiate the project during a research visit, and contribute some of the key ideas for the collaboration. |
| Collaborator Contribution | Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary. |
| Impact | Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). |
| Start Year | 2020 |
| Description | Dynamic Graph Sparsification |
| Organisation | Rutgers University |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | I initiate the project during a research visit, and contribute some of the key ideas for the collaboration. |
| Collaborator Contribution | Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary. |
| Impact | Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). |
| Start Year | 2020 |
| Description | Dynamic Graph Sparsification |
| Organisation | Stanford University |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | I initiate the project during a research visit, and contribute some of the key ideas for the collaboration. |
| Collaborator Contribution | Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary. |
| Impact | Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). |
| Start Year | 2020 |
| Description | Dynamic Graph Sparsification |
| Organisation | University of California, Berkeley |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | I initiate the project during a research visit, and contribute some of the key ideas for the collaboration. |
| Collaborator Contribution | Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary. |
| Impact | Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). |
| Start Year | 2020 |
| Description | Dynamic Graph Sparsification |
| Organisation | University of Copenhagen |
| Country | Denmark |
| Sector | Academic/University |
| PI Contribution | I initiate the project during a research visit, and contribute some of the key ideas for the collaboration. |
| Collaborator Contribution | Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary. |
| Impact | Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). |
| Start Year | 2020 |
| Description | Dynamic Graph Sparsification |
| Organisation | University of Michigan |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | I initiate the project during a research visit, and contribute some of the key ideas for the collaboration. |
| Collaborator Contribution | Designing efficient dynamic graph algorithms against an adaptive adversary is an important goal in the field of dynamic graph algorithms and has witnessed many exciting recent developments. Our work presents the first non-trivial efficient adaptive algorithms for maintaining many sparsifiers against an adaptive adversary. |
| Impact | Fully-Dynamic Graph Sparsifiers Against an Adaptive Adversary. A. Bernstein, J. van den Brand, M. Gutenberg, D. Nanongkai, T. Saranurak, A. Sidford, and H. Sun. Proceedings of the 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). |
| Start Year | 2020 |
| Description | Open vs. Closed Random Walks |
| Organisation | University of Cambridge |
| Department | Computer Laboratory |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | We answers several open questions on the open versus closed random walks raised in the research community. |
| Collaborator Contribution | A closed random walk of length l on an undirected and connected graph G = (V, E) is a random walk that returns to the start vertex at step l, and its properties have been recently related to problems in different mathematical fields, e.g., geometry and combinatorics (Jiang et al., Annals of Mathematics '21) and spectral graph theory (McKenzie et al., STOC '21). For instance, in the context of analyzing the eigenvalue multiplicity of graph matrices, McKenzie et al. show that, with high probability, the support of a closed random walk of length l ? 1 is ?(l^{1/5}) on any bounded-degree graph, and leaves as an open problem whether a stronger bound of ?(l^{1/2}) holds for any regular graph. First, we show that the support of a closed random walk of length l is at least ?(l^{1/2}/vlog n) for any regular or bounded-degree graph on n vertices. Secondly, we prove for every l ? 1 the existence of a family of bounded-degree graphs, together with a start vertex such that the support is bounded by O(l^{1/2}/vlog n). Besides addressing the open problem of McKenzie et al., these two results also establish a subtle separation between closed random walks and open random walks, for which the support on any regular (or bounded-degree) graph is well-known to be ?(l^{1/2}) for all l ? 1. For irregular graphs, we prove that even if the start vertex is chosen uniformly, the support of a closed random walk may still be O(log l). This rules out a general polynomial lower bound in l for all graphs. Finally, we apply our results on random walks to obtain new bounds on the multiplicity of the second largest eigenvalue of the adjacency matrices of graphs. |
| Impact | The Support of Open versus Closed Random Walks. with Sauerwald T., and Vagnozzi D. (ICALP'23). |
| Start Year | 2022 |
| Title | STAG: Spectral Toolkit of Algorithms for Graphs |
| Description | STAG is an open-source library of efficient spectral algorithms for graphs. It is the first such algorithmic library mainly written in C++, with a python wrapper around the underlying C++ library for python users. The source code of STAG can be found from GitHub page. |
| Type Of Technology | Webtool/Application |
| Year Produced | 2023 |
| Open Source License? | Yes |
| Impact | We have so far implemented the local graph clustering component. Comparing with existing open-source implementation for local graph clustering, STAG connects to a Neo4j database running in the AuraDB cloud service, or a database running locally. With this feature, one can run local graph clustering on the entire Wikipedia graph of more than 20GB with a single PC. It is the first open-source implementation of local graph clustering that connects a graph database. |
| URL | https://staglibrary.io |