Change-point detection for high-dimensional time series with nonstationarities

Lead Research Organisation: University of Bristol
Department Name: Mathematics

Abstract

Time series data are encountered in many areas such as finance, economics, medicine, engineering, natural and social sciences. The fundamental objectives of time series analysis are (i) to describe stochastic structure of the observed time series by identifying and fitting an appropriate model, and (ii) to predict the future behaviour by using the information extracted from current and past observations. In practical applications, the assumption of stationarity is commonly made: that the stochastic properties of time series data are invariant over time. However, real-life time series often exhibit nonstationarities and this poses as a growing problem, since the use of standard modelling and estimation techniques for stationary processes is inappropriate and may even result in misleading models and forecasts for such data.

Piecewise stationarity is one of the simplest forms of departure from stationarity, where some stochastic properties are modelled as varying over time in a piecewise constant manner. That is, the process is regarded to be stationary between any two adjacent structural change-points. Under the assumption of piecewise stationarity, multiple change-point detection provides useful insights with regards to the estimated change-points, as well as enabling prediction of future values. However, a challenge which many areas in modern statistics commonly face is that, due to technological advances, observed datasets are increasingly being recorded in higher dimensions as well as larger volumes.

The abundance of high-dimensional observations over time in many fields calls for new tools in time series analysis. Motivated by routinely observed nonstationarities in large time series data, change-point detection in high-dimensional time series has received steadily growing attention in recent years. Still, there are several challenging research questions which need to be addressed for both theoretical and methodological advances in this area, and the main goal of this proposal is to provide solutions to such open problems.

More specifically, one key objective is to develop a change-point detection methodology which not only detects and locates change-points over time, but also identifies those components of high-dimensional data that undergo the changes. It is readily envisaged that such information will play an important role in interpreting the detected change-points. Also, classical change-point analysis chiefly concerns with the detection of abrupt, jump-like changes in time-varying stochastic properties. In contrast, the proposed methodology aims at reflecting real-life applications more efficiently by allowing for changes that are smooth and gradual. Finally, the methodology will be equipped with a bootstrap technique that is applicable to re-sample high-dimensional time series data and permits rigorous inference on the detected change-points, and thus enables users to draw meaningful conclusions on the structure of the observed time series.

Planned Impact

Firstly, academics working in change-point analysis for high-dimensional time series will benefit from the novel theoretical and methodological development furnished by this proposal. Also, outside this immediate circle of beneficiaries, there are academics working on a wide range of inference and estimation problems concerning high-dimensional time series for whom, the successful delivery of Work Package (C) will provide a new re-sampling method that enables rigorous statistical inference and efficient estimation of fundamental quantities involved in modelling such data.

In addition, the proposed research has the potential for generating societal and economic impacts, by facilitating progress across a wide range of disciplines where high-dimensional time series data is collected and analysed. For example, in energy companies, a small improvement in load forecasting can bring in substantial benefits in reducing production costs and increasing trading advantages. Therefore it is strategically important to identify any structural changes in the multi-channel, high-dimensional time series data collected from their customers, prior to employing forecasting models based on the stationarity assumption. Dr. Yannig Goude (an expert researcher-project manager at EDF R&D in Paris) has expressed interest in this proposal, and we intend to instigate a collaborative project where the newly developed methodologies from this proposal will be applied to address the needs in electricity load modelling.

Healthcare is another sector where health, medical and social data of high complexity is increasingly being generated. The proposed methodologies will be applicable to detect clinically important features as change-points in patient records, thus impacting the researchers in the Healthcare Technologies theme via sub-themes "Optimising Treatment" and "Transforming Community Health and Care". Also, the influence of detecting shifts in monetary regimes on improving the quality of inflation forecasts, and thus on macroeconomic policy-making, has been noted by the researchers and the advisers at the Bank of England (Groen et al., 2008). Hence, these methods have close bearing on the grand challenge areas in the Digital Economy theme ("New Economic Models"). We will distribute software implementations of our methodologies, in order to enhance their accessibility and widen the group of potential beneficiaries.

References
Groen, J., Kapetanios, G., and Price, S. (2008), "Real time evaluation of Inflation Report and Greenbook forecasts for inflation and growth," Bank of England Working Paper No. 354.

Publications

10 25 50
 
Description 1. Factor models are a popular model for a large number of macroeconomics or financial time series data, where a small number of unobservable factors drive the dependence structure among a large number of time series data. Data observed in highly nonstationary environment over a long stretch of period is likely to exhibit nonstationarities in its structure. In my collaboration with Prof Barigozzi and Prof Fryzlewicz, I proposed a method for consistently detecting multiple change-points at which such structural breaks take place in high-dimensional time series data governed by factor models. The important finding is that performing factor analysis prior to change-point analysis improves the detectability of the change-points, and some dominant change-points may be regarded as common features or pervasive 'factors' themselves.

2. Multiscale procedures are frequently adopted for multiple change-point detection which scans the same data multiple times for detection and localisation of change-points. This, on one hand, improves the detecting power of change-point tests typically designed for single change-point detection. On the other hand, it generates false positives and duplicate estimators which need to be pruned down. Together with Prof Kirch, I proposed a generic methodology which provides a novel perspective of approaching the multiple change-point detection problem as a two-stage procedure combining the candidate generation and pruning. The proposed pruning almost inherits the good localisation property of the chosen multiscale candidate generation method, while correctly selects the model (the number of change-points) with high probability.
Exploitation Route The research outcome is available as packages in open source statistical software (https://CRAN.R-project.org/package=factorcpt, https://CRAN.R-project.org/package=mosum).
Sectors Aerospace, Defence and Marine,Creative Economy,Financial Services, and Management Consultancy,Healthcare,Pharmaceuticals and Medical Biotechnology

URL https://www.sciencedirect.com/science/article/pii/S0304407618300915?via%3Dihub
 
Description New challenges in change-point problems
Amount £159,447 (GBP)
Funding ID RPG-2019-390 
Organisation The Leverhulme Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2020 
End 04/2023
 
Description Collaboration with Prof Claudia Kirch at OvGU Magdeburg 
Organisation The Otto-von-Guericke University Magdeburg
Country Germany 
Sector Academic/University 
PI Contribution I have visited Prof Claudia Kirch at OvGU Magdeburg in March 2017 and collaborated on a project on developing a scalable, new methodology for multiple change-point estimation using multiscale MOSUM procedure with localised pruning.
Collaborator Contribution Prof Claudia Kirch is an expert in change-point analysis and time series analysis, and has contributed to the theoretical aspect of the paper. Also, we have produced a software and a journal article dedicated to the software jointly with her (then) PhD student Dr Alexander Meier.
Impact We have an R package (link provided above), a paper dedicated to the open source software and another on the methodological development and theoretical analysis of the proposed method.
Start Year 2016
 
Description Collaboration with Profs Piotr Fryzlewicz and Matteo Barigozzi 
Organisation London School of Economics and Political Science (University of London)
Country United Kingdom 
Sector Academic/University 
PI Contribution We have collaborated on a project where the aim was to develop a comprehensive methodology for change-point analysis under factor models, popularly adopted tools for dimension reduction in high-dimensional time series analysis.
Collaborator Contribution It was academic collaboration where the research expertise of my collaborators contributed to methodological development and theoretical and empirical analysis conducted in the paper.
Impact A publication in the Journal of Econometrics, and an open source package implementing the method.
Start Year 2016
 
Title breakfast: Methods for Fast Multiple Change-Point Detection and Estimation 
Description A developing software suite for multiple change-point detection/estimation (data segmentation) in data sequences. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact The current version implements the Gaussian mean-shift model, in which the data are assumed to be a piecewise-constant signal observed with i.i.d. Gaussian noise. Change-point detection in breakfast is carried out in two stages: (i) computation of a solution path, and (ii) model selection along the path. A variety of solution path and model selection methods are included, which can be accessed individually, or through breakfast 
URL https://CRAN.R-project.org/package=breakfast
 
Title factorcpt: Simultaneous Change-Point and Factor Analysis 
Description The package implements a two-stage methodology for consistent multiple change-point detection under factor modelling. It performs multiple change-point analysis on the common and idiosyncratic components separately, and thus automatically identifies their origins. The package also implements the Double CUSUM Binary Segmentation algorithm, which is proposed for multiple change-point detection in high-dimensional panel data with breaks in its mean. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The package has been published via the Comprehensive R Archive Network and available in R, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques. Thus it has the potential to be adopted in a wide range of problems involving change-point analysis in factor models. 
URL https://cran.r-project.org/web/packages/factorcpt/index.html
 
Title hdbinseg: Change-Point Analysis of High-Dimensional Time Series via Binary Segmentation 
Description Binary segmentation methods for detecting and estimating multiple change-points in the mean or second-order structure of high-dimensional time series as described in Cho and Fryzlewicz (2014) and Cho (2016) . 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact The software is available as an open source R package from CRAN, whereby anyone interested in high-dimensional change-point analysis from the mean or the second-order structure of multivariate, possibly high-dimensional time series can implement the methods proposed in the referenced papers. 
URL https://CRAN.R-project.org/package=hdbinseg
 
Title mosum: Moving Sum Based Procedures for Changes in the Mean 
Description Implementations of MOSUM-based statistical procedures and algorithms for detecting multiple changes in the mean. This comprises the MOSUM procedure for estimating multiple mean changes from Eichinger and Kirch (2018) and the multiscale algorithmic extensions from Cho and Kirch (2019+). 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact The MOSUM procedure implemented in this package provides a complementary approach to change-point analysis, which is relevant in many areas of natural sciences, medicine, economics and signal processing. 
URL https://CRAN.R-project.org/package=mosum
 
Title unsystation: Stationarity Test Based on Unsystematic Sub-Sampling 
Description The package implements a new method for testing the stationarity of time series, where the test statistic is obtained from measuring and maximising the difference in the second-order structure over pairs of randomly drawn intervals. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The package has been published via the Comprehensive R Archive Network and available in R, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques. Thus it has the potential to be adopted in a wide range of problems involving testing the nonstationarity of time series. 
URL https://cran.r-project.org/web/packages/unsystation/index.html
 
Description An invited talk at European Meeting of Statisticians 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I have given a talk on 'Simultaneous multiple change-point and factor analysis for high-dimensional time series' in a invited session at the European Meeting of Statisticians 2017, which took place July 2017 in Helsinki, Finland.
Year(s) Of Engagement Activity 2017
 
Description An invited talk at Joint Statistical Meetings 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I have a given a talk titled 'Multiple change-point estimation using MOSUM statistics via localised pruning' to the audience primarily interested in the challenges involved in modern change-point detection problems at Joint Statistical Meetings 2017 which was held in Baltimore, USA.
Year(s) Of Engagement Activity 2017
 
Description An invited talk at Workshop on 'Goodness-of-fit and change-point problems' 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I have given an invited talk on high-dimensional change-point analysis under factor modelling at a Workshop on 'Goodness-of-fit and change-point problems', Bad Herrenalb, Germany, to an audience consisting of experts in the area of change-point analysis and engaged in active discussions with the domain experts mostly based in Germany.
Year(s) Of Engagement Activity 2017
 
Description Invited talk at ICSA International Conference 2016 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I have given a talk on 'Simultaneous multiple change-point and factor analysis for high-dimensional time series' in a invited session at the International Chinese Statistical Association International Conference on global growth of modern statistics in the 21st century, which took place in Dec 2016 at Shanghai Jiao Tong University.
Year(s) Of Engagement Activity 2016
 
Description Seminar talk at Statistical Laboratory, University of Cambridge 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact I have given a seminar on 'Simultaneous multiple change-point and factor analysis for high-dimensional time series' at Statistical Laboratory, University of Cambridge, and received valuable feedbacks from the audience as well as having engaging in discussions before and after the talk
Year(s) Of Engagement Activity 2017
 
Description Seminar talk at University College London 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact I have given a talk on 'Simultaneous multiple change-point and factor analysis for high-dimensional time series', a projected supported by the award to the Stochastic Processes Group in the Department of Statistical Science at University College London.
Year(s) Of Engagement Activity 2017