Deep Learning Techniques for predicting High-Frequency returns using Order Book data

Lead Research Organisation: Imperial College London

Department Name: Mathematics

Abstract

A financial market is an ensemble of market agents willing to buy or sell a certain financial security, such as a stock, bond, or derivative. The natural infrastructure for a financial market is an exchange, a physical or virtual place that brings together buyers and sellers, facilitating the occurrence of transactions. Today most exchanges are electronic, allowing traders to access live order book information, i.e., the collection of all standing orders for a given security. Market participants have different technological infrastructures, receiving data and submitting orders at different latencies. Over the past few decades, HFTs have engaged in a fierce race to zero latency, making vast economic efforts to reduce their latency by just a few microseconds. What we aim to explore in our research is one of the possible reasons why such a race happened in the first place. Specifically, we aim to analyse the predictive value of order book data, i.e., to what extent can a trader with immediate access to the order book predict the market's future direction?
Empirical studies have shown that price formation dynamics, i.e., next mid-price moves, are predictable. In our research, we investigate whether such predictability persists at longer horizons. To do so, we employ deep learning architectures, leveraging their ability to learn complex data dependencies. So far, we have conducted an extensive empirical experiment on over one year of Nasdaq data to answer the following questions:
1. Do high-frequency returns display predictability? If so, how far ahead can we predict?
2. Which order book representations perform the best?
3. Can we use a single model across multiple horizons?
4. Can we use a single model across multiple stocks?
To answer these questions, we used model confidence sets, a structured statistical procedure particularly well suited for the problem.
There are some further questions we wish to explore. First, we would like to understand whether the structure of the order book can completely explain the predictability in returns or if recurring trading patterns play a relevant role. Moreover, we would like to understand whether such predictability is tradeable or if it might be useful for some market players, for example, helping market makers gauge the market's direction and adjust their quotes accordingly.
Graph supOU processes
While there are various discrete-time models for graph/network time series, our research project will focus on the continuous-time setting to allow for consistent modelling across time scales and account for irregular observations.
The project aims to extend previous work on Graph-Ornstein Uhlenbeck (GrOU) processes. There are various research avenues we wish to consider. First, we would like to allow for a more flexible autocorrelation structure, possibly displaying long memory. In this context, we could consider merging and advancing the existing theory of GrOU processes with that of multivariate CARMA processes. Alternatively, we could consider defining a Graph supOU process and developing suitable inference techniques. Second, we wish to explore graphs with more complex topologies. For example, we may want to allow multivariate observations on each node and/or consider networks with natural group structures.
In applications, it is often the case that the dimension of the graph's adjacency matrix is (much) larger than the number of time series observations. In this setting, an important question is whether we can estimate the adjacency matrix consistently. If so, one might want to detect sparsity in such networks. Possible real-world data sets with time-evolving graph structures stocks' realized volatilities and exchange rate pairs.
This project falls within the following EPSRC research areas: Artificial Intelligence Technologies, Digital Signal Processing, and Statistics and Applied Probability.

Planned Impact

Probabilistic modelling permeates the Financial services, healthcare, technology and other Service industries crucial to the UK's continuing social and economic prosperity, which are major users of stochastic algorithms for data analysis, simulation, systems design and optimisation. There is a major and growing skills shortage of experts in this area, and the success of the UK in addressing this shortage in cross-disciplinary research and industry expertise in computing, analytics and finance will directly impact the international competitiveness of UK companies and the quality of services delivered by government institutions.
By training highly skilled experts equipped to build, analyse and deploy probabilistic models, the CDT in Mathematics of Random Systems will contribute to
- sharpening the UK's research lead in this area and
- meeting the needs of industry across the technology, finance, government and healthcare sectors

MATHEMATICS, THEORETICAL PHYSICS and MATHEMATICAL BIOLOGY

The explosion of novel research areas in stochastic analysis requires the training of young researchers capable of facing the new scientific challenges and maintaining the UK's lead in this area. The partners are at the forefront of many recent developments and ideally positioned to successfully train the next generation of UK scientists for tackling these exciting challenges.
The theory of regularity structures, pioneered by Hairer (Imperial), has generated a ground-breaking approach to singular stochastic partial differential equations (SPDEs) and opened the way to solve longstanding problems in physics of random interface growth and quantum field theory, spearheaded by Hairer's group at Imperial. The theory of rough paths, initiated by TJ Lyons (Oxford), is undergoing a renewal spurred by applications in Data Science and systems control, led by the Oxford group in conjunction with Cass (Imperial). Pathwise methods and infinite dimensional methods in stochastic analysis with applications to robust modelling in finance and control have been developed by both groups.
Applications of probabilistic modelling in population genetics, mathematical ecology and precision healthcare, are active areas in which our groups have recognized expertise.

FINANCIAL SERVICES and GOVERNMENT

The large-scale computerisation of financial markets and retail finance and the advent of massive financial data sets are radically changing the landscape of financial services, requiring new profiles of experts with strong analytical and computing skills as well as familiarity with Big Data analysis and data-driven modelling, not matched by current MSc and PhD programs. Financial regulators (Bank of England, FCA, ECB) are investing in analytics and modelling to face this challenge. We will develop a novel training and research agenda adapted to these needs by leveraging the considerable expertise of our teams in quantitative modelling in finance and our extensive experience in partnerships with the financial institutions and regulators.

DATA SCIENCE:

Probabilistic algorithms, such as Stochastic gradient descent and Monte Carlo Tree Search, underlie the impressive achievements of Deep Learning methods. Stochastic control provides the theoretical framework for understanding and designing Reinforcement Learning algorithms. Deeper understanding of these algorithms can pave the way to designing improved algorithms with higher predictability and 'explainable' results, crucial for applications.
We will train experts who can blend a deeper understanding of algorithms with knowledge of the application at hand to go beyond pure data analysis and develop data-driven models and decision aid tools
There is a high demand for such expertise in technology, healthcare and finance sectors and great enthusiasm from our industry partners. Knowledge transfer will be enhanced through internships, co-funded studentships and paths to entrepreneurs

Student:

Lorenzo Lucchese

Period of Study:

Oct 21 - Sep 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2602120

Research Topic:

Unclassified

Organisations

Imperial College London (Lead Research Organisation)

People	ORCID iD
Almut Veraart (Primary Supervisor)	http://orcid.org/0000-0001-8582-3652
Lorenzo Lucchese (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S023925/1			01/04/2019	30/09/2027
2602120	Studentship	EP/S023925/1	01/10/2021	30/09/2025	Lorenzo Lucchese