Statistical Analysis of Time Series Data

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

This project focuses on statistical analysis and development of algorithms for analyzing various sources of potentially high-dimensional time-series data, such as those that appear in economics or physics, and also the generation of synthetic data. Many time series are known to exhibit stylised features that violate the assumptions of classical statistical methods such as non-stationarity, where the statistical properties of the underlying time-series change in time. A goal of the project is to develop new methodology to permit efficient statistical analysis in light of these concerns. High-dimensional data provides new challenges from both a computational and statistical perspective known in the literature as the curse of dimensionality. Efficient analysis of large data sets with many variables is likely to be relevant in many domains, particularly in economic situations where there may be a potentially very large number of relevant variables. In regards to the latter, in many applications and industry collaborations, getting access to real data (which is often sensitive) is a major challenge and barrier in the research pipeline. Having access to synthetic data that retains the structural properties of the real data could significantly facilitate the research process and interaction with industry. Examples of potential applications include financial transaction networks and limit order book data, with potential impact in areas such as fraud detection and financial market regulation.

Many financial and economic time series have underlying hidden factors which may lead the economic system to behave unexpectedly. Such hidden factors could be a small set of economic indicators or entities which lead many of the other indicators or entities. Therefore, idiosyncratic shocks in these indicators could then have a significant effect on the whole system. This project is also aimed at revealing such hidden factors, and hence will aid assessing the systemic risk which arises from such factors, a topic of central interest to external partners such as Bank of England. The methods to tackle this problem go beyond traditional approaches from the econometrics literature, and are drawn from unsupervised and supervised machine learning tools, and network analysis. The project will also help to develop these two areas further, and we hope our findings to be of independent interest to both communities. We expect our work to be of interest for UK policy makers, who would be able to better understand and quantify the risk exposures of UK entities or domestic sectors to external global factors.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2247868 Studentship EP/S023151/1 01/10/2019 30/09/2023 Jason Clarkson