CHARMNET - Characterising Models for Networks
Lead Research Organisation:
University of Oxford
Department Name: Statistics
Abstract
Networks have emerged as useful tool to represent and analyse complex data sets. These data sets appear in many contexts - for example, biological networks are used to represent the interplay of agents within a cell, social networks represent interactions between individuals or social entities such as websites referring to other websites, trade networks reflect trade relationships between countries.
Due to the complexity of the data which they represent, networks pose considerable obstacles for analysis. Typically the standard statistical framework of independent observations no longer applies - networks are used to represent the data precisely because they are often not independent of each other. While each network itself can be viewed as an observation, usually there are no independent observations of the whole network available.
To understand networks, probabilistic models can be employed. The behaviour of networks which are generated from such models can then be studied with tools from applied probability. Even relatively simple models provide challenges in their analysis, with more realistic complex models often out of reach of a rigorous mathematical treatment.
Hence, depending on the network behaviour of interest, it may be reasonable to approximate a complex model with a simpler model. Assessing the error in such an approximation is crucial to determine whether the approximation is suitable. This project will derive characterisations of network models which relate to a common underlying process. This common underlying process will then allow to compare models through comparing their characterisations.
Based on such comparisons, approximate test procedures can be derived by first using the simpler model to obtain the distribution of the test statistic under the null hypothesis and then taking the approximation error into account. In practice, for a given data set, a model would be fitted to the data. This fitting process introduces some variability which in itself will result in some deviations from the model. Using tools from theoretical statistics as well as applied probability, these deviations can again be assessed, with an explicit error term.
The project will exploit the observation that the method for assessing this approximation error is well adapted to analyse so-called graph neural networks, which are emerging as a tool in Artificial Intelligence. Thus the project will yield a new connection between Probability and Artificial Intelligence which will spark ideas beyond the application to network analysis.
The results will be applied to three network sets which are publicly available: protein-protein interaction networks, political blog networks, and World Trade networks. These networks are chosen because of the challenges they pose: there is to date no generally accepted model for protein-protein interaction network; moreover, the data underlying these networks contain a large amount of errors. Political blog data are used as a benchmark; several models have been proposed for these networks, and our approach will allow to compare them quantitatively. World Trade networks are weighted, directed, dynamic and spatial, and thus illustrate the complexity which our approach will be able to tackle.
Due to the complexity of the data which they represent, networks pose considerable obstacles for analysis. Typically the standard statistical framework of independent observations no longer applies - networks are used to represent the data precisely because they are often not independent of each other. While each network itself can be viewed as an observation, usually there are no independent observations of the whole network available.
To understand networks, probabilistic models can be employed. The behaviour of networks which are generated from such models can then be studied with tools from applied probability. Even relatively simple models provide challenges in their analysis, with more realistic complex models often out of reach of a rigorous mathematical treatment.
Hence, depending on the network behaviour of interest, it may be reasonable to approximate a complex model with a simpler model. Assessing the error in such an approximation is crucial to determine whether the approximation is suitable. This project will derive characterisations of network models which relate to a common underlying process. This common underlying process will then allow to compare models through comparing their characterisations.
Based on such comparisons, approximate test procedures can be derived by first using the simpler model to obtain the distribution of the test statistic under the null hypothesis and then taking the approximation error into account. In practice, for a given data set, a model would be fitted to the data. This fitting process introduces some variability which in itself will result in some deviations from the model. Using tools from theoretical statistics as well as applied probability, these deviations can again be assessed, with an explicit error term.
The project will exploit the observation that the method for assessing this approximation error is well adapted to analyse so-called graph neural networks, which are emerging as a tool in Artificial Intelligence. Thus the project will yield a new connection between Probability and Artificial Intelligence which will spark ideas beyond the application to network analysis.
The results will be applied to three network sets which are publicly available: protein-protein interaction networks, political blog networks, and World Trade networks. These networks are chosen because of the challenges they pose: there is to date no generally accepted model for protein-protein interaction network; moreover, the data underlying these networks contain a large amount of errors. Political blog data are used as a benchmark; several models have been proposed for these networks, and our approach will allow to compare them quantitatively. World Trade networks are weighted, directed, dynamic and spatial, and thus illustrate the complexity which our approach will be able to tackle.
Planned Impact
Key questions which this project addresses are
(1) What is the expected behaviour of complex models for networks? Once the expected behaviour is understood, deviations from it can be exploited to detect anomalies in networks.
(2) How can networks such as infrastructure networks and reporting networks be designed to achieve efficiency and resilience? Understanding the behaviour of models for networks can guide the design of such networks.
(3) How can the interconnectedness of people, things and data be taken into account when drawing statistical conclusions? Tests for assessing models which could include explanatory variables as parameters will be tackled in this project.
Impact will be achieved through lectures, publications, a blogpost, and through existing contacts with
(a) Accenture on anomaly detection in networks
(b) e-Therapeutics, Novo Nordisk and UCB pharma on drug target development and understanding biological disease processes
(c) Legume Technology to improve nitrogen uptake in legumes.
At least two students per year, one undergraduate student and one Master-level student, will be trained in the area of probability, network analysis and AI. The project will also generate outreach events, a blog, and webinars.
(1) What is the expected behaviour of complex models for networks? Once the expected behaviour is understood, deviations from it can be exploited to detect anomalies in networks.
(2) How can networks such as infrastructure networks and reporting networks be designed to achieve efficiency and resilience? Understanding the behaviour of models for networks can guide the design of such networks.
(3) How can the interconnectedness of people, things and data be taken into account when drawing statistical conclusions? Tests for assessing models which could include explanatory variables as parameters will be tackled in this project.
Impact will be achieved through lectures, publications, a blogpost, and through existing contacts with
(a) Accenture on anomaly detection in networks
(b) e-Therapeutics, Novo Nordisk and UCB pharma on drug target development and understanding biological disease processes
(c) Legume Technology to improve nitrogen uptake in legumes.
At least two students per year, one undergraduate student and one Master-level student, will be trained in the area of probability, network analysis and AI. The project will also generate outreach events, a blog, and webinars.
People |
ORCID iD |
G Reinert (Principal Investigator / Fellow) |
Publications
Anastasiou A
(2023)
Stein's Method Meets Computational Statistics: A Review of Some Recent Developments
in Statistical Science
Barbour A
(2021)
Estimating the correlation in network disturbance models
in Journal of Complex Networks
Bennett S
(2022)
Lead-lag detection and network clustering for multivariate time series with an application to the US equity market
in Machine Learning
Clarkson J.
(2022)
DAMNETS: A Deep Autoregressive Model for Generating Markovian Network Time Series
in Proceedings of Machine Learning Research
Fathi M
(2022)
Relaxing the Gaussian assumption in shrinkage and SURE in high dimension
in The Annals of Statistics
Fatima A
(2022)
Stein's Method for Poisson-Exponential Distributions
Fischer A
(2022)
Normal approximation for the posterior in exponential families
Fischer A
(2024)
Modified method of moments for generalized Laplace distributions
in Communications in Statistics - Simulation and Computation
Fischer A
(2023)
Stein estimation in a multivariate setting
in ArXiV
Gaunt R
(2022)
Bounds for the chi-square approximation of the power divergence family of statistics
in Journal of Applied Probability
Gaunt R
(2023)
Bounds for the chi-square approximation of Friedman's statistic by Stein's method
in Bernoulli
He Y.
(2022)
DIGRAC: Digraph Clustering Based on Flow Imbalance
in Proceedings of Machine Learning Research
He Y.
(2022)
SSSNET: Semi-Supervised Signed Network Clustering
in Proceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022
He Y.
(2022)
GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks
in Proceedings of Machine Learning Research
He Y.
(2022)
MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian
in Proceedings of Machine Learning Research
Description | LSE external reviewer |
Geographic Reach | Local/Municipal/Regional |
Policy Influence Type | Participation in a guidance/advisory committee |
Description | AI Hub |
Amount | £10,000,000 (GBP) |
Funding ID | EP/Y007484/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 02/2024 |
End | 01/2029 |
Description | FAIR: Framework for responsible adoption of Artificial Intelligence in the financial seRvices industry |
Amount | £3,166,201 (GBP) |
Funding ID | EP/V056883/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 12/2021 |
End | 11/2026 |
Description | Network Stochastic Processes and Time Series (NeST) |
Amount | £6,451,752 (GBP) |
Funding ID | EP/X002195/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2023 |
End | 12/2028 |
Description | Office of National Statistics Partnership |
Organisation | Office for National Statistics |
Country | United Kingdom |
Sector | Private |
PI Contribution | We are analysing data of direct debits and direct credits at a business sector level. To this purpose we have developed a novel model for time series on networks. It has resulted in a paper and in some conference presentations. Moreover representatives from the Department of Business and Trade have shown an interest in this work and we are in the process of expanding it to nowcast GDP-like figures. |
Collaborator Contribution | This is a partnership which has been facilitated by the Alan Turing Institute. Together with Mihai Cucuringu I supervise a PDRA, Anastasia Mantziou. The ONS provided access to a proprietary data set. It also provided in-house expertise in biweekly meetings. |
Impact | Mantziou, A., Cucuringu, M., Meirinhos, V., & Reinert, G. (2023). The GNAR-edge model: a network autoregressive model for networks with time-varying edge weights. Journal of Complex Networks, 11(6), cnad039. Multidisciplinary, includes statistics and economics |
Start Year | 2021 |
Description | The Role of Synthetic Data in Financial Systems |
Organisation | Alan Turing Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This is a 5% secondment; we derived methods for analysing networks of financial transactions. |
Collaborator Contribution | The partner provided a link with HSBC; HSBC provided data, use cases, and expertise |
Impact | internal reports so far; we are preparing publications. |
Start Year | 2022 |
Description | Trustworthy Synthetic Data in Practice |
Organisation | Alan Turing Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This is a collaboration on synthetic data. My main contribution has been the generation of synthetic networks. |
Collaborator Contribution | The partners provided data sets and expertise in generating tabular data |
Impact | Publication: SaGess paper, available on the arxiv, Stratis Limnios is the first author (spanning statistics, machine learning, computer science) Dissemination: data controller meeting in Warwick, attended by data controllers from HSBC, ONS, Bank of Italy among others, with a view of assessing black-box methods |
Start Year | 2022 |
Title | AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators |
Description | The software provides Python implementation for model assessment for implicit graph generative models, based on the AgraSSt: Approximate Graph Stein Statistics |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | The software provides the pioneered instance for checking the quality of graph generative models in a general framework that is based on Stein's method and kernel method. |
URL | https://arxiv.org/abs/2203.03673 |
Title | R code for Stein goodness-of-fit test for Exponential Random Graph Models |
Description | The R software is used to conduct the kernel-based Stein goodness-of-fit test for exponential random graph models, published as "Xu W and Reinert G, A Stein Goodness-of-test for Exponential Random Graph Models" Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:415-423, 2021. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | The software provides the state-of-the-art implementation for performing goodness-of-fit testing on exponential random graph models. |
URL | https://proceedings.mlr.press/v130/xu21b.html |
Title | software: GNNs for networks |
Description | PyTorch Geometric Signed Directed is a signed/directed graph neural network extension library for PyTorch Geometric. |
Type Of Technology | Software |
Year Produced | 2023 |
Impact | Interest in our work |
URL | https://github.com/SherylHYX/pytorch_geometric_signed_directed |
Description | Data Controller workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | This was a workshop for data controllers, with considerable industry participation, talking about use and regulations for generative AI including synthetic network data |
Year(s) Of Engagement Activity | 2024 |
Description | Hypergraph Autumn School |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | This was a half-day autumn school on hypergraphs which I co-organised. |
Year(s) Of Engagement Activity | 2010,2023 |
URL | https://www.bernoullisociety.org/news/37-general-announcement/371-autumn-school-on-hypergraphs |
Description | Keynote talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Keynote talk, at ICT Innovations 2023: 15th ICT Innovations Conference 2023, Ohrid, North Macedonia, Title: "Synthetic Networks" This conference is a key conference for graduate students in North Macedonia. |
Year(s) Of Engagement Activity | 2023 |
URL | https://ictinnovations.org/ |
Description | Network seminar talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | This was an invited seminar talk about our recent progress in Stein's method for network models. |
Year(s) Of Engagement Activity | 2022 |
Description | Organising Session EO418: statistical machine learning with kernels and nonlinear transformations at CMStatistics 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | The organised session provides the ground for academic presentation and discussion on current development on machine learning methods based on non-linear transformations as well as ignites future collaborations. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.cmstatistics.org/CMStatistics2023/fullprogramme.php |
Description | OxWIM |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Undergraduate students |
Results and Impact | Tara Trauthwein presented a poster at the event ``Beyond the pipeline: Women & Non-Binary People in Mathematics Day'' |
Year(s) Of Engagement Activity | 2024 |
URL | https://www.oxwomeninmaths2024.co.uk/ |
Description | RSS workshop on Stein's method and machine learning |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | This was a tutorial workshop providing a gentle introduction into Stein's method and machine learning. It was hybrid and attracted a large international audience. It was a morning event, followed by a research workshop in the afternoon. |
Year(s) Of Engagement Activity | 2021 |
URL | https://rss.org.uk/training-events/events/events-2021/sections/rss-applied-probability-and-computati... |
Description | RSS workshop on Stein's method and machine learning |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | This was a workshop on Stein's method and machine learning, jointly organised with Chris Oates from Newcastle. |
Year(s) Of Engagement Activity | 2021 |
URL | https://rss.org.uk/training-events/events/events-2021/sections/rss-applied-probability-and-computati... |
Description | SNS email list |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Gesine Reinert set up an email list for social network science |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=SNS |
Description | TDA talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | This was a talk on credit risk using ideas from TDA and network analysis. It is based on a collaboration with Santander UK. |
Year(s) Of Engagement Activity | 2021 |
Description | Time series generation and anomaly detection in high dimensions |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | This is an RSS workshop on Time series generation and anomaly detection in high dimensions which Gesine Reinert co-organised, with Alex Cox (Bath), Hao Ni (UCL) and Kathrin Glau (QMUL). It is an online activity and informs the time series of networks part of the project. |
Year(s) Of Engagement Activity | 2022 |
URL | https://rss.org.uk/training-events/events/events-2022/rss-events/time-series-generation-and-anomaly-... |
Description | Turing-ONS workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | This was a workshop between the Turing-ONS team and members from other groups at the ONS and the Department for Business and Trade, as well as from VocaLink. We discussed new ways to nowcast GDP. |
Year(s) Of Engagement Activity | 2024 |
Description | Tutorial lectures on Stein's method |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | This was a series of two lectures introducing graduate students from Oxford and London into Stein's method and connections with machine learning. |
Year(s) Of Engagement Activity | 2021 |
Description | Tutorial on kernel method -- Stein's Method and Machine Learning RSS workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | The lecture provides an introduction to kernel method that bridges the understanding between audience from both machine learning and applied probability background. The lecture not only helps to integrate the audiences for a better workshop experience but also ignites various fruitful discussions based on ideas combining Stein's method and kernel method. |
Year(s) Of Engagement Activity | 2021 |
URL | https://rss.org.uk/training-events/events/events-2021/sections/rss-applied-probability-and-computati... |