High-Dimensional Time Series, Common Factors, and Nonstationarity

Lead Research Organisation: London School of Economics and Political Science

Department Name: Statistics

Abstract

In this modern information age, with increasing computing power it has become commonplace to access and to analyze data of unprecedented size and complexity. In many important statistical applications, the number of variables or parameters is now as large as or even much larger than the number of observations. Inference under such a circumstance is generally acknowledged as an important challenge in contemporary statistics, and has been a focus point for active research lately. The Newton Institute in Cambridge has staged a large scale research programme on Statistical Theory and Methods for Complex, High-Dimensional Data in January -- June 2008. Against this background, the proposed project is devoted to the research on both theory and methodology for analyzing ultra-high dimensional time series which arise from various practical problems. For example, in portfolio optimization and risk management the number of assets concerned is typically in the order of hundreds or thousands. The so-called panel data, collected for various applications, consist of p time series of length n with, typically, p is larger or much larger than n.For analyzing those large scale multiple time series, dimension-reduction is a key for success. In this project we propose an innovative factor modelling technique which is statistically versatile and computationally effective. In particular, we will conduct the research in several interlocking areas including: (i) modelling high-dimensional time series with nonstationary factors, (ii) establishing high-dimensional volatility dynamics based on factors, including high-dimensional daily volatilities using high-frequency data; and (iii) identify finite dimensionality of curve time series. The results from (i) will be useful for modelling and forecasting panels of time series arising from economics, business, marketing, sociology, biology and ecology etc. (ii) addresses directly the important issues in modern finance such as asset pricing, portfolio allocation and risk management. Curve time series analysis (iii) will find applications in, for example, environment studies (annual weather record charts, annual pollution charts), finance (daily volatility curves, yield curves), marketing (sales charts). The freely-available softwares will be developed to implement the new methods.High-dimensional data analysis is clearly one of the most vibrant research areas in statistics (including biostatistics) and econometrics (including financial econometrics) these days. The novelty of this proposal lies mainly on the new estimation procedure which transfers the problem of estimating latent factors, which may be nonstationary, into a standard eigenanalysis, and therefore is applicable to the cases with the dimensionalities in the order of thousands. The idea of handling nonstationarity in the framework of curve time series is also new.

Funded Value:

£331,454

Funded Period:

Jun 10 - May 13

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/H010408/1

Principal Investigator:

Qiwei Yao

Research Subject:

Mathematical sciences (100%)

Research Topic:

Statistics & Appl. Probability (100%)

Organisations

London School of Economics and Political Science (Lead Research Organisation)

People	ORCID iD
Qiwei Yao (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Bathia N (2010) Identifying the finite dimensionality of curve time series in The Annals of Statistics

Cho H (2013) Modeling and Forecasting Daily Electricity Load Curves: A Hybrid Approach in Journal of the American Statistical Association

Lam C (2011) Estimation of latent factors for high-dimensional time series in Biometrika

Lam C (2012) Factor modeling for high-dimensional time series: Inference for the number of factors in The Annals of Statistics

Tao M (2011) Large Volatility Matrix Inference via Combining Low-Frequency and High-Frequency Approaches in Journal of the American Statistical Association

Wu B (2013) Estimation in the presence of many nuisance parameters: Composite likelihood and plug-in likelihood in Stochastic Processes and their Applications

Key Findings
Impact Summary
Software and Technical Products


Description	1. A simple, new and easy-to-use factor model for high-dimensional time series. The factors are defined in terms of the serial correlations among all the component series, which is different from most existing factor models. The estimation is resolved via a eigen-analysis of a positive semi-definite matrix. Furthermore the estimation for both the factor loadings and the number of factors exhibits the so-called "blessing of dimensionality" property. 2. The factor model stated above is further extended to handle high-dimensional volatility processes with both high-frequency data and low-frequency data, to handle some non-stationary time series. 3. Dimension-reduction for curve time series. We have made two new contributions: (a) A new notion of dimensionality for curve (or, in general, functional) time series based on their linear dynamical correlation, and a new method to identify the dimensionality. (b) We introduced a concept of the correlation dimension between two curves based on a singular-value-decomposition in a Hilbert space. It was then used to reduce a curve linear regression to several ordinary scalar linear regressions.
Exploitation Route	The methodology developed in the grant provides simple and easy-to-implement ways for dimension-reduction of high-dimensional time series, leading to efficient modelling and forecasting. The potential application is wide and over various areas where multiple time series data are present.
Sectors	Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Retail


Description	The curve regression methodology developed in Cho et al (2013) has been adopted by EDF for forecasting the daily electricity demands.
Sector	Energy
Impact Types	Economic


Title	clr
Description	This is an R package for curve linear regression developed in Haeran Cho, Yannig Goude, Xavier Brossat, Qiwei Yao (2013). Modeling and Forecasting Daily Electricity Load Curves: A Hybrid Approach. Journal of the American Statistical Association, Vol.108, 7-13. It will be available from R soon.
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	Since the publication of Cho et al (2013), there have been quite a few requests for the software which implements the curve linear regression methods. The development of this R package is to cater for the demand.
URL	https://cran.r-project.org/web/packages/clr/index.html

Abstract

Organisations

People

ORCID iD

Publications