Statistical Network Analysis: Model Selection, Differential Privacy, and Dynamic Structures

Lead Research Organisation: London School of Economics and Political Science

Department Name: Statistics

Abstract

In this proposal we tackle some challenging problems in the following three aspects of statistical network analysis.

1. Jittered resampling for selecting network models

The first and arguably the most important step in statistical modelling is to choose an appropriate model for a given data set. While there exist many data-driven model-selection methods in statistics in general, including those based on data reuse (i.e., bootstrap resampling, cross-validation), their application to network data is problematic. Therefore it remains common to choose a network model subjectively. The major difficulty in the reuse of network data is to mimic the underlying probability mechanisms. A few existing attempts include cross-validation under some specific settings. We propose a new `bootstrap jittering' or `jittered resampling' method for selecting an appropriate network model. The method does not impose any specific forms/conditions, therefore providing a generic tool for network model selection.

2. Edge differential privacy for network data

In network data individuals are typically represented by nodes and their inter-relationships are represented by edges. Therefore network data often contain sensitive individual/personal information. On the other hand the information of interest in the data should be perserved. Hence the primary concern for data privacy is two-folded: (a) to release only a sanitized version of the original network data to protect privacy, and (b) the sanitized data should preserves the information of interest such that the analysis based on the sanitized data is still meaningful. This is a vibrant research area now as data privacy becomes ever increasingly sensitive and important with available abundant personal information in digital format in this information age, though the contribution from statistics is still at a preliminary stage. We will adopt the so-called dyadwise randomized response approach. While such a scheme is differentially private, the inference based on the released data is largely unknown. Our initial investigation reveals some attractive features of this approach, suggesting more efficient statistical inference than those based on other data release mechanism. We will further develop this scheme to handle networks with additional node features/attributes (e.g., social networks with additional information on age, gender, hobby, occupation etc).

3. Modelling and forecasting dynamic networks

Most existing statistical inference methods for networks are confined to static network data, though a substantial proportion of real networks are dynamic in nature. Understanding and being able to forecast the changes over time are of immense importance for, e.g., monitoring anomalies in internet traffic networks, predicting demand and setting pricing in electricity supply networks, managing natural resources in environmental readings in sensor networks, and understanding how news and opinion propagates in online social networks. Unfortunately the development of the foundation for dynamic networks is still in its infancy, and the available modelling and inference tools are sparse. As for dealing with dynamic changes of networks, most available techniques are based on the evolution analysis of snapshot networks over time without really modelling the changes dynamically. Although this reflects the fact that most networks change slowly over time, it does not provides any insight on the dynamics underlying the changes and is almost powerless for future prediction for which it is essential to build appropriate stochastic models to capture dynamic dependence and dynamic changes explicitly. Combining recent developments on tensor decomposition and factor-driven dimension reduction with the efficient time series tools such as exponential smoothing and Kalman filters, we will take on this challenge to build some new dynamic models.

Funded Value:

£501,155

Funded Period:

Jun 21 - Aug 24

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/V007556/1

Principal Investigator:

Qiwei Yao

Research Subject:

Mathematical sciences (100%)

Research Topic:

Statistics & Appl. Probability (100%)

Organisations

People	ORCID iD
Qiwei Yao (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Chang J (2022) Testing for unit roots based on sample autocovariances in Biometrika

Chang J (2024) Edge differentially private estimation in the ß-model via jittering and method of moments in The Annals of Statistics

Chang J (2023) Modelling matrix time series via a tensor CP-decomposition in Journal of the Royal Statistical Society Series B: Statistical Methodology

Chang J (2024) An autocovariance-based learning framework for high-dimensional functional time series in Journal of Econometrics

Han Y (2023) Simultaneous Decorrelation of Matrix Time Series in Journal of the American Statistical Association

Jiang Binyan (2023) Autoregressive Networks in JOURNAL OF MACHINE LEARNING RESEARCH

Xu X (2021) Day-ahead probabilistic forecasting for French half-hourly electricity loads and quantiles for curve-to-curve regression in Applied Energy

Zhang B (2023) Factor Modeling for Clustering High-Dimensional Time Series in Journal of the American Statistical Association

Zhou Y (2023) Testing for the Markov property in time series via deep conditional generative learning. in Journal of the Royal Statistical Society. Series B, Statistical methodology

Key Findings
Impact Summary
Collaboration
Software and Technical Products
Engagement Activities


Description	1. We provide a simple and explicit autoregressive type framework to model and forecast dynamic changes of network data. It facilitates simple and efficient statistical inference and model diagnostic checking. The framework can serve as a basic building block to accommodate various stylized features observed in real network data. 2. A standing challenge in data privacy is the trade-off between the level of privacy and the efficiency of statistical inference. We adopt the so-called dyadwise randomized response approach. While such a scheme is differentially private, the estimation for network parameters shows an interesting phase transition. We further devise a novel adaptive bootstrap procedure to construct uniform inference across different phases.
Exploitation Route	Issues around data privacy attracts ever increasing attention in society. Network data privacy is perceived to be especially difficult due to its special structure and often binary nature. Our work in this direction will contribute to the statisticians' contribution in this important area. Dynamic modelling and forecasting for network flows directly links time series with the new development and challenges associated with big data, which is very much needed. Hence the potential beneficiaries include both theoretical and applied network data analysts in statistics and other disciplines such as computer science, social network, network communication, energy distribution and forecasting, genetic linkages, economics, finance and etc.
Sectors	Communities and Social Services/Policy Creative Economy Energy Environment Financial Services and Management Consultancy Manufacturing including Industrial Biotechology Security and Diplomacy
URL	https://stats.lse.ac.uk/q.yao/qyao.links/publicationsAll.html


Description	The methods developed in the following paper has been incorporated in EDF Frence for their probability forecasting for the electricity loads and risk management. Xu, X., Chen, Y., Goude, Y. and Yao, Q. (2021). Day-ahead Probabilistic Forecasting for French Half-hourly Electricity Loads and Quantiles for Curve-to-Curve Regression. Applied Energy, 301, 117456
First Year Of Impact	2022
Sector	Energy
Impact Types	Policy & public services


Description	EDF in Paris
Organisation	Électricité de France EDF
Country	France
Sector	Private
PI Contribution	We continue our collaboration on forecasting electricity loads by developing new predictive bands/regions.
Collaborator Contribution	The newly developed predictive bands/regions provide the probability forecasts for daily comsumption curves. The predictive quantiles at different probability levels deliver insightful information on prospective future scenarios, which is valuable for hedging risks in electricity management
Impact	One paper, and an R package.
Start Year	2010


Title	HDTSA
Description	An R package available at CRAN project specialized on various statistical inference for high-dimensional time series factor modelling, principal component analysis for vector and matrix time series, cointegration, and the inference for unit roots and cointegration.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	The software is publically avaialble through CRAN project.
URL	https://cran.r-project.org/package=HDTSA


Description	An invited talk at Conference on "Recent Advances in Statistics and Data Science" in Rutgers
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Study participants or study members
Results and Impact	Conference on Recent Advances in Statistics and Data Science with a Celebration of Professors Regina Liu and Cun-Hui Zhang's Special Birthdays
Year(s) Of Engagement Activity	2023
URL	https://statistics.rutgers.edu/news-events/conferences/684-conference-on-recent-advances-in-statisti...


Description	Invited talk at 2023 IMS International Conference on Statistics and Data Science, Lisbon
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Study participants or study members
Results and Impact	The objective of ICSDS is to bring together researchers in statistics and data science from academia, industry, and government in a stimulating setting to exchange ideas on the developments of modern statistics, machine learning, and broadly defined theory, methods, and applications in data science.
Year(s) Of Engagement Activity	2023
URL	https://www.icsds2023.com/


Description	Invited talk at Conference on "Statistical Foundations of Data Science and Applications" in Princeton
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Study participants or study members
Results and Impact	The conference was in honour of Professor Jianqing Fan's 60 birthday attended by over 300 academics, students and people working in industry,
Year(s) Of Engagement Activity	2023
URL	https://fan60.princeton.edu/


Description	Invited talk at Conference on 2023 Kansas Econometrics Workshop, Kansas
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Study participants or study members
Results and Impact	This workshop consists of a series of yearly workshops focusing on recent developments of econometrics theories and methodologies as well as applications in economics and finance and other applied fields such as data sciences and statistics. The main purpose of the econometrics workshop series at KU is to promote methodological and theoretical research as well as applications in modern econometrics and statistics as well as data science, and to provide a forum for researchers, including Ph.D. students, to come together to interact through social discussions and presentations.
Year(s) Of Engagement Activity	2023
URL	https://econometrics.ku.edu/


Description	Invited talk at The OMI Machine Learning in Financial Econometrics, Oxford Man Institute
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Study participants or study members
Results and Impact	The workshop is to to the dissemination of cutting-edge ideas in economics, financial industry using machine learning tools.
Year(s) Of Engagement Activity	2023
URL	https://web.cvent.com/event/78dec7d3-ee2d-4ddb-b14d-b05e782bb209/summary

Abstract

Organisations

People

ORCID iD

Publications