Spectral embedding methods and subsequent inference tasks on dynamic multiplex graphs

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

The proposed research is centred around statistical analysis of dynamic multiplex graphs (DMPGs). Mathematically, a graph, also known as network, can be interpreted as a collection of nodes, with edges occurring between them. Network data are collected in many domains, such as healthcare, biology, and cyber-security, and they are becoming increasingly rich, continuously generating new research questions. In particular, dynamic multiplex networks are emerging as increasingly common data structures observed in real-world applications. In DMPGs, edges could have different types, and evolve in time. For example, in an enterprise computer network, nodes could be represented by hosts, and edges correspond to connections between them, occurring dynamically over time on different ports. Because of the complexity of such objects, research has only scratched the surface with statistical modelling for DMPGs. Therefore, the development of novel statistical methodology is required, and this research intends to bridge this gap, developing realistic statistical models for DMPGs.

The aim of this research proposal is to develop principled and scalable statistical models which represent the full multi-layered complexity of dynamic multiplex graphs. This goal will be achieved by exploiting an array of statistical techniques, spanning from spectral methods to topic modelling. In particular, this research proposal focusses on techniques for discovering low-dimensional substructure in networks, known as embedding methods. Such techniques have the added benefit of aiding subsequent inference tasks, such as clustering of nodes with similar behaviour. The statistical properties of novel embedding methods proposed for DMPGs will be carefully assessed, and the proposed methods will be utilised to improve and extend existing models for clustering, link prediction, and anomaly detection. In addition, the proposed models will have the flexibility to encompass additional information on nodes and edges, available in the form of covariates. In particular, this research proposal will focus on incorporating unstructured data, such as text, within the proposed modelling frameworks, combining aspects from network analysis and natural language processing.

Publications

10 25 50