Modelling for time series data with latent structure

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Many processes in engineering and physical sciences generate large time series datasets with
embedded latent structure. Geophysical, industrial and economic data streams often feature such
latent structures. These latent structures can be utilised for extracting a representation of the data
that facilitates downstream tasks such as forecasting, regime classification and outlier detection.
While developments in statistical science and machine learning have been highly successful for a
range of tasks on non-time series data, methods for time series data have been less studied. This
project, which falls within the EPSRC statistics and applied probability research area, aims to
develop and analyse algorithms for the detection of latent structures in large time series data. We
seek to extend and adapt a number of existing tools in statistics and machine learning to the novel
setting of time series data in order to achieve this.
As a first example, we aim to develop algorithms for the extraction of latent structures in network
time series data using clustering methods. Clustering techniques have become very popular in the
last decade, and show potential to substantially improve on latent structure detection compared to
traditional approaches based on linear factor modelling. Instances of latent structures in network
time series data include lead-lag relationships and system states. Lead-lag relationships refer to the
correlation between two time series shifted in time relative to one another and are often found in
physical multivariate time series data. System states refer to similarities in network snapshots across
time and can be used to capture regimes in such data.
As a second example, we aim to develop algorithms for learning latent structures in panels of time
series data through machine learning models for lower-dimensional data representation. The multitask
and meta-learning literatures provide frameworks for learning low-dimensional representations
of data through the sharing of information between time series tasks. These frameworks can be used
to learn a joint data representation by leveraging between-task similarities and thus enhance
performance on the tasks of interest. The flexibility of these framework lies in the generality of
tasks that can be considered. These frameworks, in the context of modelling a system of time series
variables, can be used for leveraging between-variable similarity as well as across-time similarity.
This research can lead to immediate impact in a large number of applications. Time series data is
ubiquitous and arises in a myriad of applications, ranging from financial data analysis to biomedical
research. We aim to develop time series methods that can be applied to general time series datasets.
An example potential application of our research is to financial and economic time series
modelling. The low signal-to-noise ratio, presence of outliers and non-stationarity often found in
financial and economic datasets provide a challenging test-bed for time series methods. The
possible applications of our time series analysis methods align well with the emerging interests in
finance and economic data science at The Alan Turing Institute. Collaborations with industry
partners via The Alan Turing Institute could allow us to test our methodology on real-world
applications.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2247853 Studentship EP/S023151/1 01/10/2019 30/09/2023 Stefanos Bennett