Streaming Estimation for the Spectral Density"

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Recent technological innovations have resulted in a large increase in both data generation and data collection capabilities in many application areas, especially given the many advances in real-time information capture. Classical approaches for analysis and modelling require the data to be stored and read before it can be processed. Depending on the speed at which the data is being collected, this could result in our algorithms and models requiring a prohibitive amount of memory and computational power. Furthermore, in real world applications, the data generating process has the possibility of undergoing changes as the data is being collected. This presents another shortcoming of classical algorithms. They usually have a baked-in assumption that the generating process does not change and thus are unsuited for the task once we relax this assumption.

This project falls within the EPSRC research areas for Digital Signal Processing, Statistics and Applied probability. We aim to develop methodology designed to process high frequency data, while also retaining a level of adaptability that allows us to deal with changes in the underlying random process. These methods will enable the analysis of time series as they are being observed and thus give us the ability to react to changes in real time. This is particularly useful in areas such as cyber-security, where anomalous behaviour deviating from the norm needs to be detected and investigated as soon as possible. Such a problem involves having the best possible up-to-date estimate for what the norm is, while also being able to judge any given set of datapoints as anomalous.

In this project, we plan to develop methodology specifically aimed at estimating the spectral density of a time series as it evolves during the data collection process. With the spectral density capturing the vast majority of the information regarding a random data generating process, these algorithms would enable us to track changes in both the long-term seasonality and the short-term trends. Areas of current focus are non-parametric change-point detection and multivariate time series analysis. The latter we plan to extend by incorporating graph prediction and network analysis. At present, seasonality has rarely been used to analyse the underlying structure of a network. In our work we look to address this shortcoming in the literature and develop new tools for network analysis. To maximise the impact of our work, we will also develop open-source software - with documentation and demonstrations - that we will share online.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2602530 Studentship EP/S023151/1 02/10/2021 30/08/2025 Shahriar Kazi