CHARMNET - Characterising Models for Networks

Lead Research Organisation: University of Oxford

Department Name: Statistics

Abstract

Networks have emerged as useful tool to represent and analyse complex data sets. These data sets appear in many contexts - for example, biological networks are used to represent the interplay of agents within a cell, social networks represent interactions between individuals or social entities such as websites referring to other websites, trade networks reflect trade relationships between countries.

Due to the complexity of the data which they represent, networks pose considerable obstacles for analysis. Typically the standard statistical framework of independent observations no longer applies - networks are used to represent the data precisely because they are often not independent of each other. While each network itself can be viewed as an observation, usually there are no independent observations of the whole network available.

To understand networks, probabilistic models can be employed. The behaviour of networks which are generated from such models can then be studied with tools from applied probability. Even relatively simple models provide challenges in their analysis, with more realistic complex models often out of reach of a rigorous mathematical treatment.

Hence, depending on the network behaviour of interest, it may be reasonable to approximate a complex model with a simpler model. Assessing the error in such an approximation is crucial to determine whether the approximation is suitable. This project will derive characterisations of network models which relate to a common underlying process. This common underlying process will then allow to compare models through comparing their characterisations.

Based on such comparisons, approximate test procedures can be derived by first using the simpler model to obtain the distribution of the test statistic under the null hypothesis and then taking the approximation error into account. In practice, for a given data set, a model would be fitted to the data. This fitting process introduces some variability which in itself will result in some deviations from the model. Using tools from theoretical statistics as well as applied probability, these deviations can again be assessed, with an explicit error term.

The project will exploit the observation that the method for assessing this approximation error is well adapted to analyse so-called graph neural networks, which are emerging as a tool in Artificial Intelligence. Thus the project will yield a new connection between Probability and Artificial Intelligence which will spark ideas beyond the application to network analysis.

The results will be applied to three network sets which are publicly available: protein-protein interaction networks, political blog networks, and World Trade networks. These networks are chosen because of the challenges they pose: there is to date no generally accepted model for protein-protein interaction network; moreover, the data underlying these networks contain a large amount of errors. Political blog data are used as a benchmark; several models have been proposed for these networks, and our approach will allow to compare them quantitatively. World Trade networks are weighted, directed, dynamic and spatial, and thus illustrate the complexity which our approach will be able to tackle.

Planned Impact

Key questions which this project addresses are

(1) What is the expected behaviour of complex models for networks? Once the expected behaviour is understood, deviations from it can be exploited to detect anomalies in networks.
(2) How can networks such as infrastructure networks and reporting networks be designed to achieve efficiency and resilience? Understanding the behaviour of models for networks can guide the design of such networks.
(3) How can the interconnectedness of people, things and data be taken into account when drawing statistical conclusions? Tests for assessing models which could include explanatory variables as parameters will be tackled in this project.

Impact will be achieved through lectures, publications, a blogpost, and through existing contacts with

(a) Accenture on anomaly detection in networks
(b) e-Therapeutics, Novo Nordisk and UCB pharma on drug target development and understanding biological disease processes
(c) Legume Technology to improve nitrogen uptake in legumes.

At least two students per year, one undergraduate student and one Master-level student, will be trained in the area of probability, network analysis and AI. The project will also generate outreach events, a blog, and webinars.

Funded Value:

£1,125,315

Funded Period:

Apr 21 - Mar 26

Funder:

EPSRC

Project Status:

Active

Project Category:

Fellowship

Project Reference:

EP/T018445/1

Principal Investigator:

G Reinert

Research Subject:

Mathematical sciences (100%)

Research Topic:

Statistics & Appl. Probability (100%)

Organisations

University of Oxford (Fellow, Lead Research Organisation)

People	ORCID iD
G Reinert (Principal Investigator / Fellow)

Publications

Author Name Title

Publication Date Published

|< < 1 2 > >|

10 25 50

Xu W. (2022) A Kernelised Stein Statistic for Assessing Implicit Generative Models in Advances in Neural Information Processing Systems

Xu W. (2021) A Stein Goodness-of-fit Test for Exponential Random Graph Models in Proceedings of Machine Learning Research

Xu, W (2021) A Stein Goodness-of-test for Exponential Random Graph Models

Xu W (2022) AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators

Xu W. (2022) AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators in Advances in Neural Information Processing Systems

Xu W (2022) AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators

Gaunt R (2021) Bounds for the chi-square approximation of Friedman's statistic by Stein's method

Gaunt R (2023) Bounds for the chi-square approximation of Friedman's statistic by Stein's method in Bernoulli

Gaunt R (2022) Bounds for the chi-square approximation of the power divergence family of statistics in Journal of Applied Probability

Ouyang R (2024) Complex Networks & Their Applications XII - Proceedings of The Twelfth International Conference on Complex Networks and their Applications: COMPLEX NETWORKS 2023 Volume 1

Armbruster S (2023) COVID-19 incidence in the Republic of Ireland: A case study for network-based time series models

Clarkson J (2022) DAMNETS: A Deep Autoregressive Model for Generating Markovian Network Time Series

Clarkson J. (2022) DAMNETS: A Deep Autoregressive Model for Generating Markovian Network Time Series in Proceedings of Machine Learning Research

Stefanos Bennett (2021) Detection and clustering of lead-lag networks for multivariate time series with an application to financial markets

He Y (2021) DIGRAC: Digraph Clustering Based on Flow Imbalance

He Y. (2022) DIGRAC: Digraph Clustering Based on Flow Imbalance in Proceedings of Machine Learning Research

He Y (2022) DIGRAC: Digraph Clustering Based on Flow Imbalance

Barbour A (2021) Estimating the correlation in network disturbance models in Journal of Complex Networks

Pardo-Diaz J (2022) Extracting Information from Gene Coexpression Networks of Rhizobium leguminosarum. in Journal of computational biology : a journal of computational molecular cell biology

Pardo-Diaz J (2022) Generating weighted and thresholded gene coexpression networks using signed distance correlation. in Network science (Cambridge University Press)

He, Y. (2022) GNNRank: Learning global rankings from pairwise comparisons via directed graph neural networks

He Y. (2022) GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks in Proceedings of Machine Learning Research

Temcinas T (2023) Goodness-of-fit via Count Statistics in Dense Random Simplicial Complexes

Cooper J (2022) Intelligent Data Engineering and Automated Learning - IDEAL 2022 - 23rd International Conference, IDEAL 2022, Manchester, UK, November 24-26, 2022, Proceedings

Bennett S (2022) Lead-lag detection and network clustering for multivariate time series with an application to the US equity market

Further Funding
Software and Technical Products
Engagement Activities


Description	Network Stochastic Processes and Time Series (NeST)
Amount	£6,451,752 (GBP)
Funding ID	EP/X002195/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2023
End	12/2028


Title	AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators
Description	The software provides Python implementation for model assessment for implicit graph generative models, based on the AgraSSt: Approximate Graph Stein Statistics
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	The software provides the pioneered instance for checking the quality of graph generative models in a general framework that is based on Stein's method and kernel method.
URL	https://arxiv.org/abs/2203.03673


Title	R code for Stein goodness-of-fit test for Exponential Random Graph Models
Description	The R software is used to conduct the kernel-based Stein goodness-of-fit test for exponential random graph models, published as "Xu W and Reinert G, A Stein Goodness-of-test for Exponential Random Graph Models" Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:415-423, 2021.
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	The software provides the state-of-the-art implementation for performing goodness-of-fit testing on exponential random graph models.
URL	https://proceedings.mlr.press/v130/xu21b.html


Description	Network seminar talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	This was an invited seminar talk about our recent progress in Stein's method for network models.
Year(s) Of Engagement Activity	2022


Description	RSS workshop on Stein's method and machine learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	This was a workshop on Stein's method and machine learning, jointly organised with Chris Oates from Newcastle.
Year(s) Of Engagement Activity	2021
URL	https://rss.org.uk/training-events/events/events-2021/sections/rss-applied-probability-and-computati...


Description	RSS workshop on Stein's method and machine learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This was a tutorial workshop providing a gentle introduction into Stein's method and machine learning. It was hybrid and attracted a large international audience. It was a morning event, followed by a research workshop in the afternoon.
Year(s) Of Engagement Activity	2021
URL	https://rss.org.uk/training-events/events/events-2021/sections/rss-applied-probability-and-computati...


Description	TDA talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	This was a talk on credit risk using ideas from TDA and network analysis. It is based on a collaboration with Santander UK.
Year(s) Of Engagement Activity	2021


Description	Time series generation and anomaly detection in high dimensions
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	This is an RSS workshop on Time series generation and anomaly detection in high dimensions which Gesine Reinert co-organised, with Alex Cox (Bath), Hao Ni (UCL) and Kathrin Glau (QMUL). It is an online activity and informs the time series of networks part of the project.
Year(s) Of Engagement Activity	2022
URL	https://rss.org.uk/training-events/events/events-2022/rss-events/time-series-generation-and-anomaly-...


Description	Tutorial lectures on Stein's method
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	This was a series of two lectures introducing graduate students from Oxford and London into Stein's method and connections with machine learning.
Year(s) Of Engagement Activity	2021


Description	Tutorial on kernel method -- Stein's Method and Machine Learning RSS workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	The lecture provides an introduction to kernel method that bridges the understanding between audience from both machine learning and applied probability background. The lecture not only helps to integrate the audiences for a better workshop experience but also ignites various fruitful discussions based on ideas combining Stein's method and kernel method.
Year(s) Of Engagement Activity	2021
URL	https://rss.org.uk/training-events/events/events-2021/sections/rss-applied-probability-and-computati...

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications