EPSRC Project Summary: New Methods for Network Time Series Analysis

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

The rapidly increasing availability of multivariate time series data with explicit or implicit network structure have resulted in a heightened interest in network time series models for the purposes of forecasting or network structure inference. Such models have been used to forecast time series across a diverse array of research areas, from epidemiology to meteorology to social media networks. For example, the wind speed in a given area may depend on both past observations in the same location and those of its geographic neighbours, with varying lags and effect sizes. An accurate wind speed model may be a useful tool for deciding on the locations of new wind turbines, when it is not cost effective to collect the data at all candidate locations for long periods of time. The recently-developed generalised network autoregressive (GNAR) model provides both a flexible and highly parsimonious approach to the modelling of such data, by allowing dependence of the modelled series on an autoregressive component and neighbours across multiple covariate networks.

My research aims to extend the GNAR modelling framework and develop new methods pertaining to network time series analysis in several areas. One extension would involve the development of novel algorithms for GNAR network structure inference in the absence of any network priors, allowing the treatment of all multivariate time series data sets as network time series. This would build on existing research for structural inference of Bayesian networks. Secondly, the GNAR model structure may be extended to incorporate node-specific exogenous time series regressors, which should lead to better forecasts and useful inferences when informative explanatory variables are available. Thirdly, my research will attempt to generalise network time series models to tensor-valued time series. For example, in the area of epidemiology, this would allow the parsimonious modelling of network time series where each location (or node in the network) possesses multiple time series representing case numbers, meteorological conditions and other factors relevant to disease transmission.

Finally, I will examine the applications of deep learning to big network time series data sets, by using an initial network lifting preprocessing step to detrend and spatially decorrelate the data set. Of particular interest are extensions of `hybrid' deep learning architectures, such as the recently-developed Gaussian Process Long Short Term Memory (GP-LSTM) model. GP-LSTM uses a recurrent neural network to embed the kernel matrix of a Gaussian process and perform inference in a highly scalable fashion. As well as achieving state-of-the-art performance in time series forecasting tasks, the GP-LSTM allows for the straightforward estimation of the uncertainty in predictions of traditionally `opaque' neural networks. Furthermore, to my knowledge, the use of a network lifting scheme for feeding data into such deep learning models has not yet been examined in the machine learning literature.

It is my hope that research in these areas will present novel contributions to the field of network time series analysis, that is to provide methodological tools to forecast multivariate time series using highly parsimonious models that exploit network structure. This project falls within the EPSRC Statistics and Applied Probability research area.

________________________________________

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2283002 Studentship EP/S023151/1 01/10/2019 30/03/2024 James Wei