Signature transformation of paths from rough analysis

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Brief description of the context of the research including potential impact
The mathematical notion of a path captures the concept of a continuously time-ordered sequence of values. These objects and their generalisations, occur widely throughout both pure and applied mathematics. For example, the analysis of the sample paths of a stochastic process forms a significant part of stochastic analysis, while time series analysis is an established tool in modern statistics. Abstract paths are inherently infinite-dimensional objects, and it is desirable to seek low-dimensional summaries which capture some features of interest. A mathematically-principled approach to effecting this has gained prominence in recent years. This approach involves using the (path) signature transform which, in distinction to traditional methods based on sampling, is rooted in capturing the path by understanding its effects on any smooth non-linear controlled differential system.
Representing paths in terms of signatures also offer several computational advantages. For example, the signature transform captures all the non-linearity of the underlying path in the sense that every continuous real-valued function from $p$-variation paths can be arbitrarily well approximated by a linear function applied on the path signature: the learning of functions of path become linear regressions on signatures. Also, because the norm of the $n$-order signature terms decay factorially, higher order terms tend to be very small and can be naturally left out and the signature truncated, making the truncated signature transform a natural and tractable finite dimensional representation of paths.

Aims and objectives
The goal of this research is to leverage the properties of the signature transform in several areas related to time-series analysis and data science, such as (1) optimal transport, (2) sequence clustering, (3) natural language processing (NLP), and (4) reinforcement learning (RL).

Novelty of the research methodology
All the methodologies are novel as they are among the first ones to exploit the signature transform in the above mentioned fields. The reasons are twofold: first, the signature transform can be a complex mathematical tool to understand for practitioners as it underpins complex pure mathematics related to the theory of controlled differential equations, rough path analysis and statistics. Second, the leveraging of the signature method in data science has only recently started (less than 10 years ago) and is still spreading out.

Alignment to EPSRC's strategies and research areas
This project falls within the EPSRC Mathematics of Random Systems (EP/S023925/1) research area' where Statistics and applied probability and Mathematical analysis are some of the themes or research areas (https://epsrc.ukri.org/research/ourportfolio/researchareas/).

Any companies or collaborators involved
My supervisors: Thomas Cass and Dan Crisan. "Signature transform and optimal transport" and "Signature transform and sequence clustering" is a joint project with Thomas Cass. "Signature transform and NLP" is a joint project with Cris Salvi. "Signature transform and RL" is a joint project with Lingyi Yang and Cris Salvi.

Planned Impact

Probabilistic modelling permeates the Financial services, healthcare, technology and other Service industries crucial to the UK's continuing social and economic prosperity, which are major users of stochastic algorithms for data analysis, simulation, systems design and optimisation. There is a major and growing skills shortage of experts in this area, and the success of the UK in addressing this shortage in cross-disciplinary research and industry expertise in computing, analytics and finance will directly impact the international competitiveness of UK companies and the quality of services delivered by government institutions.
By training highly skilled experts equipped to build, analyse and deploy probabilistic models, the CDT in Mathematics of Random Systems will contribute to
- sharpening the UK's research lead in this area and
- meeting the needs of industry across the technology, finance, government and healthcare sectors

MATHEMATICS, THEORETICAL PHYSICS and MATHEMATICAL BIOLOGY

The explosion of novel research areas in stochastic analysis requires the training of young researchers capable of facing the new scientific challenges and maintaining the UK's lead in this area. The partners are at the forefront of many recent developments and ideally positioned to successfully train the next generation of UK scientists for tackling these exciting challenges.
The theory of regularity structures, pioneered by Hairer (Imperial), has generated a ground-breaking approach to singular stochastic partial differential equations (SPDEs) and opened the way to solve longstanding problems in physics of random interface growth and quantum field theory, spearheaded by Hairer's group at Imperial. The theory of rough paths, initiated by TJ Lyons (Oxford), is undergoing a renewal spurred by applications in Data Science and systems control, led by the Oxford group in conjunction with Cass (Imperial). Pathwise methods and infinite dimensional methods in stochastic analysis with applications to robust modelling in finance and control have been developed by both groups.
Applications of probabilistic modelling in population genetics, mathematical ecology and precision healthcare, are active areas in which our groups have recognized expertise.

FINANCIAL SERVICES and GOVERNMENT

The large-scale computerisation of financial markets and retail finance and the advent of massive financial data sets are radically changing the landscape of financial services, requiring new profiles of experts with strong analytical and computing skills as well as familiarity with Big Data analysis and data-driven modelling, not matched by current MSc and PhD programs. Financial regulators (Bank of England, FCA, ECB) are investing in analytics and modelling to face this challenge. We will develop a novel training and research agenda adapted to these needs by leveraging the considerable expertise of our teams in quantitative modelling in finance and our extensive experience in partnerships with the financial institutions and regulators.

DATA SCIENCE:

Probabilistic algorithms, such as Stochastic gradient descent and Monte Carlo Tree Search, underlie the impressive achievements of Deep Learning methods. Stochastic control provides the theoretical framework for understanding and designing Reinforcement Learning algorithms. Deeper understanding of these algorithms can pave the way to designing improved algorithms with higher predictability and 'explainable' results, crucial for applications.
We will train experts who can blend a deeper understanding of algorithms with knowledge of the application at hand to go beyond pure data analysis and develop data-driven models and decision aid tools
There is a high demand for such expertise in technology, healthcare and finance sectors and great enthusiasm from our industry partners. Knowledge transfer will be enhanced through internships, co-funded studentships and paths to entrepreneurs

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023925/1 01/04/2019 30/09/2027
2279905 Studentship EP/S023925/1 01/10/2019 30/01/2024 Remy Messadene