From Stochastic Analysis to applications in Statistics and Machine Learning

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute


The aim of this project is to use recent insights from stochastic analysis in order to develop new tools to make inferences from sequential observations. Sequentially ordered data are ubiquitous in modern science, often occurring as time series, location series, or more generally as sequentially observed samples of structured objects. Making inferences from such observations is challenging. One issue is the size of the domain e.g. the set of sequences, paths, etc - even storing such data can be challenging. Another issue is that deriving any guarantees on learning falls outside the usual realm of statistical learning theory since the data spaces one works with (a space of paths) is not even locally compact.

One tool from stochastic analysis that has been applied recently is the so-called signature of a path. This gives rise to a graded description of a path (an element in an infinite dimensional space) as a series of tensors. This description can be used as a feature map and can intuitively be described as an ordered version of the sample moments. It completely describes a sequence or path and is invariant under time-parametrization. Moreover, it is universal and characteristic, that is, linear functionals of the signature approximate functionals of paths and the expected value characterises the probability distribution.

Signatures have already found many applications in stochastic analysis e.g. in rough path theory and regularity structures, but their use in statistics and machine learning is more recent. Promising results exist in deep learning, topological data analysis (so-called persistence paths) and kernel learning. The potential impact of this ranges from new theoretical insights in stochastic analysis and statistical learning, to completely new ways to deal with complex, sequentially structured data (such as the data used daily in the financial industry, healthcare, etc).

One challenge we face is coping with the combinatorial explosion of coordinates in the signature which so far has made it challenging to apply this to high-dimensional paths the way they appear in the real world. Some methods such as kernelization or random projections/hashing have been worked on and one focus of this project is to develop new ways to circumvent or deal with such questions as scalability. Another focus is to emphasize the non-parametric aspect of this approach - that is to develop a better understanding of how well functionals of paths can be approximated and to prove statistical learning guarantees.

This project falls within the EPSRC Mathematical Sciences research area.

There will be no companies or collaborators involved.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513295/1 01/10/2018 30/09/2023
2099757 Studentship EP/R513295/1 01/10/2018 30/09/2021 Patric Bonnier