An investigation into the use of machine learning techniques - specifically reinforcement learning (RL)

Lead Research Organisation: University of Liverpool

Department Name: Computer Science

Abstract

This PhD will be an investigation into the use of machine learning techniques - specifically reinforcement learning (RL) - to solve problems within the trading domain, such as: market making; optimal liquidation; and multi-venue statistical arbitrage; covering financial and prediction markets. Historically, approaches to solving these problems have been grounded in mathematical optimisation techniques, most notably stochastic optimal control and dynamic programming [1, 2]. However, these suffer from a fundamental limitation: to obtain closed form solutions one must specify the dynamics of the price process (the model) which can only be assumed. As a result, the derived algorithms may be impractical or intractable to solve. The aim is thus to develop algorithms that learn directly from historical data via simulation and limit order book reconstruction. The problem of high-frequency market making in limit order book markets will be central to this study. In this context an agent must learn to manage inventory and balance risk and reward from tick-to-tick where individual actions occupy very short time-scales ( ms and less). Though there are sparingly few papers that apply reinforcement learning to this practice [3], there have been some great successes in solving similar problems [5, 6] from which to draw inspiration. One envisions a reinforcement learning agent - that is responsive to changes in the market and derived signals - that could replace human traders, enabling a single person to manage a large basket of securities simultaneously. The research will also have a significant component addressing the techniques for simulating an order book given partial information as in [4], a crucial component of training these agents. The results should thus have real practical applications, especially in the financial services industry. There are many problems that must be addressed to solve these trading problems using reinforcement learning. For example: 1. Overcoming stochasticity: It would be naive to suggest that the markets are Markovian or fully observable, as such how does one learn efficiently in this environment. 2. State representation: Alternative formulations of the learning problem, such as predictive state representations, may be better suited to trading problems. 3. Temporal abstraction: A trading agent should have knowledge of the impact of a decision over multiple time scales in order to better capture the balance between short- and long-term payoffs. 4. Encoding risk and reward: There exists some interesting work on safe reinforcement learning which could be adapted for the trading domain. 5. Function approximation: An important question to address is whether the dynamics of the markets can be captured sufficiently by linear methods (such as tile coding) versus non-linear approaches (like neural networks). 6. A priori knowledge: How does one direct the agent into areas of the policy space that are known to be profitable beforehand, say, using potential-based reward shaping. Further, is it possible to transfer knowledge across trading problems?
References
[1] Avellaneda, M., and Stoikov, S. High-frequency trading in a limit order book. Quantitative Finance 8, 3 (2008), 217-224.
[2] Cartea, Á., Jaimungal, S., and Penalva, J. Algorithmic and High-Frequency Trading. Cambridge University Press, 2015.
[3] Chan, N. T., and Shelton, C. An Electronic Market-Maker. Seventh International Conference of the Society for Computational Economics (2001), 43.
[4] Christensen, H. L., Turner, R. E., Hill, S. I., and Godsill, S. J. Rebuilding the limit order book: sequential Bayesian inference on hidden states. Quantitative Finance 13, 11 (nov 2013), 1779-1799.
[5] Kim, A. J., Shelton, C. R., and Poggio, T. Modeling Stock Order Flows and Learning Market-Making from Data. SSRN Electronic Journal (2002), 8.
[6] Nevmyvaka, Y., Feng, Y., and Kearns, M. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd in

Student:

Thomas Spooner

Period of Study:

Sep 16 - Sep 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1797905

Research Topic:

Unclassified

Organisations

University of Liverpool (Lead Research Organisation)

People	ORCID iD
Rahul Savani (Primary Supervisor)	http://orcid.org/0000-0003-1262-7831
Thomas Spooner (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Spooner T (2018) Market Making via Reinforcement Learning

Spooner T Bayesian optimisation of restriction zones for bluetongue control in Scientific Reports

Spooner T (2020) Robust Market Making via Adversarial Reinforcement Learning

Spooner Thomas (2018) Market Making via Reinforcement Learning in arXiv e-prints

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509693/1			30/09/2016	29/09/2021
1797905	Studentship	EP/N509693/1	30/09/2016	29/09/2020	Thomas Spooner

Key Findings
Impact Summary
Software and Technical Products


Description	There are three main discoveries/developments associated with this award: a) applications of reinforcement learning (RL) to complex trading domains; b) application of optimisation to an epidemiological control problem; c) techniques for provably convergent generative modelling (via GANs). The first is described in detail in my published work: "Market Making via Reinforcement Learning." In this project we built a realistic reconstruction of a limit order book market using historical data. Rather than make assumptions about the nature of the stocks processes, we derive trading strategies directly from past events. Reinforcement learning, which had previously only been applied to toy market making problems, was shown to be an effective tool to derive profitable strategies. Further, we introduce reward functions which directly incentivise market-making behaviour and/or risk sensitivity. We have subsequently applied these, and more advanced techniques, to a variety of problems studied in the finance/trading domain to show that RL can replace existing techniques that require knowledge of stochastic calculus and analytic dynamic programming. In all cases we are able to reproduce the optimal solution. However, by using model-free techniques, we are also able to solve a host of problems that would otherwise be intractable. This line of work has been extended significantly in a recent extended abstract (and in-review full paper) on methods for improving robustness to model ambiguity. This introduces the concept of adversarial learning into the market making problem, which leads to in some cases to strategies that align with theoretical results and provide better performance in the face of epistemic uncertainty. Further work on risk-averse RL applied to this problem will soon be submitted to NeurIPS as well. The second looked at the control of Bluetongue, a viral disease of ruminants (such as cows and sheep), via restrictions on the movement of animals. We take the current policy enforced by the government and show that optimisation techniques (those that we use in meta-optimisation of the trading problems described above) can identify alternative policies with huge reductions in cost to farms. This work will be submitted in the next few weeks. The third example is focussed on a modern generative modelling technique known as generative adversarial networks (GANs) which suffer from an array of theoretical convergence issues. In this work we are developing a new training method which guarantees convergence on the correct and full solution. GANs are particularly relevant to the trading domain as many algorithms may be derived from sampling a generative model of a financial market of which GANs represent the state-of-the-art.
Exploitation Route	The first two main areas in this work are associated with problems driven by economic incentives. Whether this is reduction in cost to farms, or maximisation of profit in trading algorithms. In both cases, the techniques developed and presented have direct applications. In the former, we explicitly recommend government to consider approaches such as the one presented to replace existing heuristic solutions. The latter has already had use by others and has had a significant amount of interest from third parties. In all cases, there are significant directions to be taken from an academic perspective. Research on the applications of reinforcement learning to the trading domain are rare for various reasons. We believe that an great number of problems in this area have yet to be solved because they are too hard to tackle analytically. The work conducted due to this award will help promote further studies into this area. The same can be said for the work in epidemiology. As an inter-disciplinary project it brings to light techniques and ideas that may otherwise have remained unknown by either party. Finally, the generative modelling project is primarily academic and represents one of many efforts by the Computer Science community to push the envelope on GANs.
Sectors	Agriculture Food and Drink Environment Financial Services and Management Consultancy


Description	The findings of our 2018 paper on market making are being actively applied in industry, and further work has received significant attention from practitioners. We hope that this will also be true for the work on Bluetongue.
Sector	Financial Services, and Management Consultancy


Title	rsrl
Description	rsrl provides generic constructs for reinforcement learning (RL) experiments in an extensible framework with efficient implementations of existing methods for rapid prototyping.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	This software has been featured in the book "Practical Machine Learning with Rust: Creating Intelligent Applications in Rust" and is actively downloaded.
URL	https://github.com/tspooner/rsrl

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects