Spectral embedding methods and subsequent inference tasks on dynamic multiplex graphs

Lead Research Organisation: Imperial College London

Department Name: Mathematics

Abstract

The proposed research is centred around statistical analysis of dynamic multiplex graphs (DMPGs). Mathematically, a graph, also known as network, can be interpreted as a collection of nodes, with edges occurring between them. Network data are collected in many domains, such as healthcare, biology, and cyber-security, and they are becoming increasingly rich, continuously generating new research questions. In particular, dynamic multiplex networks are emerging as increasingly common data structures observed in real-world applications. In DMPGs, edges could have different types, and evolve in time. For example, in an enterprise computer network, nodes could be represented by hosts, and edges correspond to connections between them, occurring dynamically over time on different ports. Because of the complexity of such objects, research has only scratched the surface with statistical modelling for DMPGs. Therefore, the development of novel statistical methodology is required, and this research intends to bridge this gap, developing realistic statistical models for DMPGs.

The aim of this research proposal is to develop principled and scalable statistical models which represent the full multi-layered complexity of dynamic multiplex graphs. This goal will be achieved by exploiting an array of statistical techniques, spanning from spectral methods to topic modelling. In particular, this research proposal focusses on techniques for discovering low-dimensional substructure in networks, known as embedding methods. Such techniques have the added benefit of aiding subsequent inference tasks, such as clustering of nodes with similar behaviour. The statistical properties of novel embedding methods proposed for DMPGs will be carefully assessed, and the proposed methods will be utilised to improve and extend existing models for clustering, link prediction, and anomaly detection. In addition, the proposed models will have the flexibility to encompass additional information on nodes and edges, available in the form of covariates. In particular, this research proposal will focus on incorporating unstructured data, such as text, within the proposed modelling frameworks, combining aspects from network analysis and natural language processing.

Funded Value:

£164,344

Funded Period:

Mar 24 - Aug 26

Funder:

ISPF

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/Y002113/1

Principal Investigator:

Francesco Sanna Passino

Research Subject:

Mathematical sciences (100%)

Research Topic:

Statistics & Appl. Probability (100%)

Organisations

People	ORCID iD
Francesco Sanna Passino (Principal Investigator)	http://orcid.org/0000-0002-4571-6681

Publications

Author Name

Title Publication Date Published

10 25 50

Li X (2024) FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets

Key Findings
Impact Summary
Collaboration


Description	This research project has so far focused on developing new statistical tools to analyse complex network data with multiple edge types that change over time. Such networks are found in many areas, from computer networks to financial markets. The research has led so far to four key achievements, reflected in four preprints (currently under submission) and one published paper: 1. Time-specific and layer-specific node embeddings via Doubly Unfolded Adjacency Spectral Embedding (DUASE) - We have developed DUASE, a method to obtain separate latent representation of nodes over time and over layers, representing different edge types on a graph. This helps us understand patterns in evolving networks, such as how online interactions differ across social media platforms over time. Additionally, this work has contributed to new theory and methodology that provide convergence guarantees for response prediction in latent structure network time series. These developments have been applied to biological learning networks of larval Drosophila, enhancing our understanding of neural processes in this model organism. 2. Network models for multivariate time series - We have introduced a new statistical model called NIRVAR (Network Informed Restricted Vector Autoregression) that combines time series data with an underlying network structure estimated from the observations. This approach is useful for analysing situations where network activity influences trends over time and between different series, such as predicting how connections in a computer network might change based on past behaviour across different nodes. This model has been applied on financial returns, macroeconomic data and bike sharing systems, demonstrating state-of-the-art performance in prediction tasks. The proposed procedure accommodates for time series of matrices as well, admitting a multiplex analogue. 3. Detecting changes in multiplex network point processes - We have proposed a method for quickly identifying sudden changes in network behaviour, especially in networks with groups or communities. This tool can be particularly useful in cybersecurity, where detecting unusual patterns in real time can help identify potential threats. 4. Financial trend detection with temporal knowledge graphs - We have designed a system called FinDKG that uses large language models to extract dynamic multiplex network data from news articles, and proposed a novel architecture for prediction of future links within the network. We used the proposed method to track and predict global financial trends, obtaining excellent performance. Overall, the research so far advances our ability to analyse and interpret complex networks with multiple edge types, offering practical tools for industries like biology, transportation, cybersecurity, finance, and beyond.
Exploitation Route	The outcomes of this research funding are well-positioned to be widely used by others, thanks to the development of open-access Python libraries for each project. These libraries, released alongside preprints and published papers, ensure that the methodologies are fully reproducible and accessible to both researchers and practitioners in industry and academia. The developed methods are highly general and can be applied to any network that evolves over time with multiple edge types (such as dynamic multiplex networks, temporal knowledge graphs). This flexibility means the tools could support diverse applications, such as: - Biological research - Analysing time series of brain networks in biology, like those of larval Drosophila, to gain insights into learning and behaviour. - Cybersecurity: Monitoring dynamic computer networks to detect unusual patterns and potential threats in real time. - Financial analysis: Leveraging temporal knowledge graphs to predict global market trends using vast streams of unstructured data.
Sectors	Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Financial Services and Management Consultancy Transport


Description	The doubly unfolded adjacency spectral embedding (DUASE) proposed in this work has been used for applications on real-world networks describing geopolitical interactions between countries and financial news, and to analyse the learning circuit of a collection of Drosophila larvae from brain connectivity data. Additionally, Network Informed Vector Autoregression (NIRVAR) has been applied to financial returns data, macroeconomic time series, and bike sharing systems. The methodologies proposed as part of this research work are general and applicable to a wide range of sectors.


Description	Collaboration with Johns Hopkins University (Professor Carey Priebe)
Organisation	Johns Hopkins University
Country	United States
Sector	Academic/University
PI Contribution	Establishing the collaboration with Professor Carey Priebe and the Department of Applied Mathematics and Statistics at Johns Hopkins University was one of the main objectives of this project proposal. The PI has visited Johns Hopkins University twice in the first year of the funding (April 2024, November 2024), and recurrent online meetings with the research partner (Professor Carey Priebe) and his research group have been carried out. These efforts have already led to publishing one preprint, currently under submission in an academic journal, and multiple other collaborative publications which will be finalised during the next 12 months.
Collaborator Contribution	The project partner has contributed primarily in staff time from Professor Carey Priebe and his research group, and use of facilities during the PI's visit to Johns Hopkins University. Additionally, the project partner has contributed data for testing methodologies proposed during the project, and staff time for recurrent online meetings to discuss progress in the research work.
Impact	The first output of this collaboration is the following preprint: Acharyya, A., Sanna Passino, F., Trosset, M. W., and Priebe, C. E. (2025), "Convergence guarantees for response prediction in latent structure network time series".
Start Year	2024

Abstract

Organisations

People

ORCID iD

Publications