Unifying community detection using higher-order structures in directed networks

Lead Research Organisation: University of Strathclyde
Department Name: Mathematics and Statistics

Abstract

Community detection is an essential tool in the analysis of complex networks for grouping together similar nodes. This has proven useful in applications as diverse as understanding cell functions; finding groups of genes related to colon cancer in gene co-occurrences networks; uncovering groups of users spreading terrorist propaganda on Twitter; and understanding connections between politicians. In the case of undirected networks, a simple and widely accepted definition of a community or a partition is that nodes within a community are densely connected but only loosely connected to the nodes outside. Finding such communities/partitions has been an area of abundant research, and an increasing number of algorithms to uncover community structures have been proposed. Each makes its own interpretation of the meaning of "densely'" and/or "loosely" connected, and the best way to reveal such a structure.
Detection of partitions/communities when the network is directed is a much more sparsely populated field, with no such consensus about what a community should look like. In part this is because the very nature of a community is strongly application dependent. There are situations where nodes can be expected to belong to the same community but may not be connected at all. The potential of effective community detection in directed networks is great and already it has been used to highlight a lack of racial mixture within relationships of some high school students; derive recommender systems on websites; and to explore the pattern of knowledge transfer between technology fields.
A framework has been proposed based on higher-order representations of networks in terms of graphlets, and they offer an attractive route to community detection. In larger part this is because these higher-order representations of networks produce undirected graphs, called Motif Adjacency Matrices (MAM). With a judicious choice of graphlets, the MAM should exhibit community structures that fit the classic definition stated for undirected networks. One can aim to employ the community detection methods designed for such a situation. These MAMs have proven meaningful in applications such as food webs, to accurately uncover the ecological classes; in transcriptional regulation networks to uncover functionalities of groups of operons; in transportation networks to highlight hub airports in North America; and in neuronal networks, to explain the control of nictation. However efficient algorithms to build these MAM and exploit them, as well as effective validation tools, are still thin on the ground.
Purpose of this project is to investigate more deeply how one can exploit graphlets to produce higher-order representations of networks, with a focus on unifying the application dependent problem of community detection within directed networks.

1. Building higher-order representations. Some efficient methods have been proposed for 3-node graphlets, and certain kinds of 4-node graphlets. These graphlets do not cover the range of communities of interest. Project aims to extend existing work to build MAMs for other graphlets by adapting existing methods and/or proposing new ones, and providing efficient implementations to be made publicly available. Since some key graphlets may occur with very high frequency in a network, MAMs have the potential to be dense, which is a barrier to developing methods to deal with large-scale networks. We propose to develop methods that can work with only partially filled MAMs.
2. Automating graphlet choice. The choice of the right graphlet to use to get a MAM which is consistent with a particular application can be far from straightforward. The possibility of automating this choice will be investigated, for instance by using sweep profile-like procedures or by applying consensus algorithms on multiple MAMs.
3. Determining effective community detection algorithms/definitions.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W52394X/1 01/10/2021 30/09/2025
2598017 Studentship EP/W52394X/1 01/10/2021 30/09/2025