Building a robust Data Science toolset via Computational Rough Paths: Localised Regression based on the Signature Method

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute

Abstract

The context of this project is that Rough Paths Theory provides a convenient and effective way to describe streamed data while dropping the noise generated throughout the data sampling process. It seems particularly valuable as a tool to be used in conjunction with the rapidly developing toolset provided by Data Science. It is particularly attractive when the underlying data is complex, multimodal and evolving but not stationary or regularly sampled.
The primary benefit of the approach is that the transform provided by the Signature removes an infinite dimensional group of symmetries that would often cause profound difficulties for the learning process. The Signature Method represents a non-parametric way for extracting characteristic features from data. Thus, this approach allows to summarise information contained in the data by transforming it into a set of essential features and create a favourable and promising framework to perform Machine Learning tasks.
The applications of this methodology are widespread. One of the biggest and strongest banks is currently using this approach to re-think the pricing procedure of their derivatives portfolio as recently shown in (Arribas, 2018). The method has also been used in psychiatry to analyse self-reported mood and consequently separate diagnostic groups, as reported in (Arribas, Kate, Goodwin, & Lyons, 2017). Furthermore, technology used in mobile phones to translate finger movements into Chinese characters has been developed using the Signature Method ((Zecheng, Zenghui, Lianwen, Ziyong, & Shuye, 2016)).
My own expertise combines mathematical foundations with a strong ability to compute. My goal is to develop those initial strengths to further progress the effective use of Rough Paths Theory in Data Science. There are many problems where one would like to predict the outcome for an individual based on a large collection of histories of individuals. In many of these examples there is no natural metric of similarity. One of the advantages of this approach is that one can avoid the introduction of metrics prematurely. Therefore, I will start this project trying to build robust principle ways to perform Localised Regression using Signatures in moderate dimensions. If I can develop a robust mathematically principled approach, balanced with packages for scikit-learn and TensorFlow then this would be a great personal outcome.

This project falls within the EPSRC Mathematical sciences research area.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513295/1 01/10/2018 30/09/2023
2100087 Studentship EP/R513295/1 01/10/2018 31/03/2022 Cristopher Salvi