📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Statistical Network Analysis: Model Selection, Differential Privacy, and Dynamic Structures

Lead Research Organisation: London School of Economics and Political Science
Department Name: Statistics

Abstract

In this proposal we tackle some challenging problems in the following three aspects of statistical network analysis.

1. Jittered resampling for selecting network models

The first and arguably the most important step in statistical modelling is to choose an appropriate model for a given data set. While there exist many data-driven model-selection methods in statistics in general, including those based on data reuse (i.e., bootstrap resampling, cross-validation), their application to network data is problematic. Therefore it remains common to choose a network model subjectively. The major difficulty in the reuse of network data is to mimic the underlying probability mechanisms. A few existing attempts include cross-validation under some specific settings. We propose a new `bootstrap jittering' or `jittered resampling' method for selecting an appropriate network model. The method does not impose any specific forms/conditions, therefore providing a generic tool for network model selection.

2. Edge differential privacy for network data

In network data individuals are typically represented by nodes and their inter-relationships are represented by edges. Therefore network data often contain sensitive individual/personal information. On the other hand the information of interest in the data should be perserved. Hence the primary concern for data privacy is two-folded: (a) to release only a sanitized version of the original network data to protect privacy, and (b) the sanitized data should preserves the information of interest such that the analysis based on the sanitized data is still meaningful. This is a vibrant research area now as data privacy becomes ever increasingly sensitive and important with available abundant personal information in digital format in this information age, though the contribution from statistics is still at a preliminary stage. We will adopt the so-called dyadwise randomized response approach. While such a scheme is differentially private, the inference based on the released data is largely unknown. Our initial investigation reveals some attractive features of this approach, suggesting more efficient statistical inference than those based on other data release mechanism. We will further develop this scheme to handle networks with additional node features/attributes (e.g., social networks with additional information on age, gender, hobby, occupation etc).

3. Modelling and forecasting dynamic networks

Most existing statistical inference methods for networks are confined to static network data, though a substantial proportion of real networks are dynamic in nature. Understanding and being able to forecast the changes over time are of immense importance for, e.g., monitoring anomalies in internet traffic networks, predicting demand and setting pricing in electricity supply networks, managing natural resources in environmental readings in sensor networks, and understanding how news and opinion propagates in online social networks. Unfortunately the development of the foundation for dynamic networks is still in its infancy, and the available modelling and inference tools are sparse. As for dealing with dynamic changes of networks, most available techniques are based on the evolution analysis of snapshot networks over time without really modelling the changes dynamically. Although this reflects the fact that most networks change slowly over time, it does not provides any insight on the dynamics underlying the changes and is almost powerless for future prediction for which it is essential to build appropriate stochastic models to capture dynamic dependence and dynamic changes explicitly. Combining recent developments on tensor decomposition and factor-driven dimension reduction with the efficient time series tools such as exponential smoothing and Kalman filters, we will take on this challenge to build some new dynamic models.

Publications

10 25 50

publication icon
Chang J (2023) Modelling matrix time series via a tensor CP-decomposition in Journal of the Royal Statistical Society Series B: Statistical Methodology

publication icon
Han Y (2023) Simultaneous Decorrelation of Matrix Time Series in Journal of the American Statistical Association

publication icon
Jiang Binyan (2023) Autoregressive Networks in JOURNAL OF MACHINE LEARNING RESEARCH

publication icon
Zhang B (2023) Factor Modeling for Clustering High-Dimensional Time Series in Journal of the American Statistical Association

publication icon
Zhou Y (2023) Testing for the Markov property in time series via deep conditional generative learning. in Journal of the Royal Statistical Society. Series B, Statistical methodology

 
Description 1. We provide a simple and explicit autoregressive type framework to model and forecast dynamic changes of network data. It facilitates simple and efficient statistical inference and model diagnostic checking. The framework can serve as a basic building block to accommodate various stylized features observed in real network data.
2. A standing challenge in data privacy is the trade-off between the level of privacy and the efficiency of statistical inference. We adopt the so-called dyadwise randomized response approach. While such a scheme is differentially private, the estimation for network parameters shows an interesting phase transition. We further devise a novel adaptive bootstrap procedure to construct uniform inference across different phases.
Exploitation Route Issues around data privacy attracts ever increasing attention in society. Network data privacy is perceived to be especially difficult due to its special structure and often binary nature. Our work in this direction will contribute to the statisticians' contribution in this important area. Dynamic modelling and forecasting for network flows directly links time series with
the new development and challenges associated with big data, which is very much needed. Hence the potential beneficiaries include both theoretical and applied network data analysts in statistics and other disciplines such as computer science, social network, network communication, energy distribution and forecasting, genetic linkages, economics, finance and etc.
Sectors Communities and Social Services/Policy

Creative Economy

Energy

Environment

Financial Services

and Management Consultancy

Manufacturing

including Industrial Biotechology

Security and Diplomacy

URL https://stats.lse.ac.uk/q.yao/qyao.links/publicationsAll.html
 
Description The methods developed in the following paper has been incorporated in EDF Frence for their probability forecasting for the electricity loads and risk management. Xu, X., Chen, Y., Goude, Y. and Yao, Q. (2021). Day-ahead Probabilistic Forecasting for French Half-hourly Electricity Loads and Quantiles for Curve-to-Curve Regression. Applied Energy, 301, 117456
First Year Of Impact 2022
Sector Energy
Impact Types Policy & public services

 
Description EDF in Paris 
Organisation Électricité de France EDF
Country France 
Sector Private 
PI Contribution We continue our collaboration on forecasting electricity loads by developing new predictive bands/regions.
Collaborator Contribution The newly developed predictive bands/regions provide the probability forecasts for daily comsumption curves. The predictive quantiles at different probability levels deliver insightful information on prospective future scenarios, which is valuable for hedging risks in electricity management
Impact One paper, and an R package.
Start Year 2010
 
Title HDTSA 
Description An R package available at CRAN project specialized on various statistical inference for high-dimensional time series factor modelling, principal component analysis for vector and matrix time series, cointegration, and the inference for unit roots and cointegration. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact The software is publically avaialble through CRAN project. 
URL https://cran.r-project.org/package=HDTSA
 
Description An invited talk at Conference on "Recent Advances in Statistics and Data Science" in Rutgers 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact Conference on Recent Advances in Statistics and Data Science with a Celebration of Professors Regina Liu and Cun-Hui Zhang's Special Birthdays
Year(s) Of Engagement Activity 2023
URL https://statistics.rutgers.edu/news-events/conferences/684-conference-on-recent-advances-in-statisti...
 
Description Invited talk at 2023 IMS International Conference on Statistics and Data Science, Lisbon 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact The objective of ICSDS is to bring together researchers in statistics and data science from academia, industry, and government in a stimulating setting to exchange ideas on the developments of modern statistics, machine learning, and broadly defined theory, methods, and applications in data science.
Year(s) Of Engagement Activity 2023
URL https://www.icsds2023.com/
 
Description Invited talk at Conference on "Statistical Foundations of Data Science and Applications" in Princeton 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact The conference was in honour of Professor Jianqing Fan's 60 birthday attended by over 300 academics, students and people working in industry,
Year(s) Of Engagement Activity 2023
URL https://fan60.princeton.edu/
 
Description Invited talk at Conference on 2023 Kansas Econometrics Workshop, Kansas 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact This workshop consists of a series of yearly workshops focusing on recent developments of econometrics theories and methodologies as well as applications in economics and finance and other applied fields such as data sciences and statistics. The main purpose of the econometrics workshop series at KU is to promote methodological and theoretical research as well as applications in modern econometrics and statistics as well as data science, and to provide a forum for researchers, including Ph.D. students, to come together to interact through social discussions and presentations.
Year(s) Of Engagement Activity 2023
URL https://econometrics.ku.edu/
 
Description Invited talk at The OMI Machine Learning in Financial Econometrics, Oxford Man Institute 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact The workshop is to to the dissemination of cutting-edge ideas in economics, financial industry using machine learning tools.
Year(s) Of Engagement Activity 2023
URL https://web.cvent.com/event/78dec7d3-ee2d-4ddb-b14d-b05e782bb209/summary