# High-Dimensional Time Series, Common Factors, and Nonstationarity

Lead Research Organisation:
London School of Economics & Pol Sci

Department Name: Statistics

### Abstract

In this modern information age, with increasing computing power it has become commonplace to access and to analyze data of unprecedented size and complexity. In many important statistical applications, the number of variables or parameters is now as large as or even much larger than the number of observations. Inference under such a circumstance is generally acknowledged as an important challenge in contemporary statistics, and has been a focus point for active research lately. The Newton Institute in Cambridge has staged a large scale research programme on Statistical Theory and Methods for Complex, High-Dimensional Data in January -- June 2008. Against this background, the proposed project is devoted to the research on both theory and methodology for analyzing ultra-high dimensional time series which arise from various practical problems. For example, in portfolio optimization and risk management the number of assets concerned is typically in the order of hundreds or thousands. The so-called panel data, collected for various applications, consist of p time series of length n with, typically, p is larger or much larger than n.For analyzing those large scale multiple time series, dimension-reduction is a key for success. In this project we propose an innovative factor modelling technique which is statistically versatile and computationally effective. In particular, we will conduct the research in several interlocking areas including: (i) modelling high-dimensional time series with nonstationary factors, (ii) establishing high-dimensional volatility dynamics based on factors, including high-dimensional daily volatilities using high-frequency data; and (iii) identify finite dimensionality of curve time series. The results from (i) will be useful for modelling and forecasting panels of time series arising from economics, business, marketing, sociology, biology and ecology etc. (ii) addresses directly the important issues in modern finance such as asset pricing, portfolio allocation and risk management. Curve time series analysis (iii) will find applications in, for example, environment studies (annual weather record charts, annual pollution charts), finance (daily volatility curves, yield curves), marketing (sales charts). The freely-available softwares will be developed to implement the new methods.High-dimensional data analysis is clearly one of the most vibrant research areas in statistics (including biostatistics) and econometrics (including financial econometrics) these days. The novelty of this proposal lies mainly on the new estimation procedure which transfers the problem of estimating latent factors, which may be nonstationary, into a standard eigenanalysis, and therefore is applicable to the cases with the dimensionalities in the order of thousands. The idea of handling nonstationarity in the framework of curve time series is also new.

## People |
## ORCID iD |

Qiwei Yao (Principal Investigator) |

### Publications

Bathia N
(2010)

*Identifying the finite dimensionality of curve time series*in The Annals of Statistics
Cho H
(2013)

*Modeling and Forecasting Daily Electricity Load Curves: A Hybrid Approach*in Journal of the American Statistical Association
Lam C
(2011)

*Estimation of latent factors for high-dimensional time series*in Biometrika
Lam C
(2012)

*Factor modeling for high-dimensional time series: Inference for the number of factors*in The Annals of Statistics
Tao M
(2011)

*Large Volatility Matrix Inference via Combining Low-Frequency and High-Frequency Approaches*in Journal of the American Statistical Association
Wu B
(2013)

*Estimation in the presence of many nuisance parameters: Composite likelihood and plug-in likelihood*in Stochastic Processes and their ApplicationsDescription | 1. A simple, new and easy-to-use factor model for high-dimensional time series. The factors are defined in terms of the serial correlations among all the component series, which is different from most existing factor models. The estimation is resolved via a eigen-analysis of a positive semi-definite matrix. Furthermore the estimation for both the factor loadings and the number of factors exhibits the so-called "blessing of dimensionality" property. 2. The factor model stated above is further extended to handle high-dimensional volatility processes with both high-frequency data and low-frequency data, to handle some non-stationary time series. 3. Dimension-reduction for curve time series. We have made two new contributions: (a) A new notion of dimensionality for curve (or, in general, functional) time series based on their linear dynamical correlation, and a new method to identify the dimensionality. (b) We introduced a concept of the correlation dimension between two curves based on a singular-value-decomposition in a Hilbert space. It was then used to reduce a curve linear regression to several ordinary scalar linear regressions. |

Exploitation Route | The methodology developed in the grant provides simple and easy-to-implement ways for dimension-reduction of high-dimensional time series, leading to efficient modelling and forecasting. The potential application is wide and over various areas where multiple time series data are present. |

Sectors | Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Retail |

Description | The curve regression methodology developed in Cho et al (2013) has been adopted by EDF for forecasting the daily electricity demands. |

Sector | Energy |

Impact Types | Economic |

Title | clr |

Description | This is an R package for curve linear regression developed in Haeran Cho, Yannig Goude, Xavier Brossat, Qiwei Yao (2013). Modeling and Forecasting Daily Electricity Load Curves: A Hybrid Approach. Journal of the American Statistical Association, Vol.108, 7-13. It will be available from R soon. |

Type Of Technology | Software |

Year Produced | 2018 |

Open Source License? | Yes |

Impact | Since the publication of Cho et al (2013), there have been quite a few requests for the software which implements the curve linear regression methods. The development of this R package is to cater for the demand. |

URL | https://cran.r-project.org/web/packages/clr/index.html |