Out-of-sample Performance Estimation

Lead Research Organisation: Imperial College London

Department Name: Mathematics

Abstract

One of the key issues in quantitative investing is the assessment and selection of investment strategies. This is typically done by building an algorithm to construct a portfolio at a given point in time, running this algorithm over a historical period using only the available data at each point in time, this is known as walk-forward testing. The strategy is then assessed by computing what the return would have been given the investor had held the generated portfolio, and finally estimating summary statistics based on this simulation, which will then be used to assess the strategy prior to live trading. Once the strategy is live, true out-of-sample performance can be collected one day at a time.

However, various issues can arise in this procedure. It is possible to misestimate or introduce bias into the target portfolio in a number of ways, such as: multiple testing (by which multiple strategies are tested on the same dataset, leading to overfitting), epistemic uncertainty (e.g. incorrect model specification) or aleatoric uncertainty (e.g. variability in the observed outcome). These issues (among others) culminate in causing an investor to incorrectly estimate what the 'out-of-sample' (or 'live') performance of an investment strategy will be, leading to misallocation of their available capital.

The aim of this project is to develop a set of methods to account for these issues and provide better estimates of out-of-sample performance. The most widely covered methods in the literature primarily focus on multiple testing, but in practice the proposed methods are difficult if not impossible to apply due to requiring accurate bookkeeping of the tests conducted, which may either be not recorded in the first place, or not available to the portfolio assessor. One possible solution to the multiple testing problem could come from deep learning. If it is possible to synthetically generate a dataset where the amount of overfitting is known and then train a network on this dataset, then the newly trained net can be applied to real data, where the amount of overfitting is difficult to estimate. This could be in the form of a GAN, whereby the generator creates the synthetic data and the discriminator can then be used on the real data.

There is a small body of literature, (Kan & Zhou, 2007. Kourtis, 2016) which suggests statistically motivated bounds and 'haircuts' for the Sharpe Ratio, but solely consider the static rather than dynamic case, where portfolio weights are updated through time. To solve this, tools from robust optimisation and dynamic programming may prove beneficial. Typically, robust optimisation seeks to find an optimal 'strategy' under uncertainty, however it is also possible to consider what the performance of a given strategy would be under uncertainty. For example, if the distribution of the returns is replaced by a 'worse' distribution (controlled by limiting the relative entropy between the two distributions), what will the performance be under this new distribution?

Finally, Random Matrix Theory provides a number of interesting tools, in particular the modern concept of `deterministic equivalents'. This has already seen applications in the estimation of out-of-sample performance for Random Neural Networks (RNN), where a deterministic equivalent for the MSE of a 1-layer RNN was found thus enabling the estimation of the out-of-sample MSE ex-ante. Naturally, the estimation of out-of-sample performance for investment strategies is a natural candidate for this method also.

This project primarily falls within the EPSRC 'Operational research' research area, however it also has connections to the 'Statistics and applied probability' research area, among others. The project is supported by Qube Research & Technologies (QRT) a quantitative hedge fund, with supervision from Marco Dion, Head of Research, at QRT and benefits from discussion with other researchers at QRT.

Planned Impact

Probabilistic modelling permeates the Financial services, healthcare, technology and other Service industries crucial to the UK's continuing social and economic prosperity, which are major users of stochastic algorithms for data analysis, simulation, systems design and optimisation. There is a major and growing skills shortage of experts in this area, and the success of the UK in addressing this shortage in cross-disciplinary research and industry expertise in computing, analytics and finance will directly impact the international competitiveness of UK companies and the quality of services delivered by government institutions.
By training highly skilled experts equipped to build, analyse and deploy probabilistic models, the CDT in Mathematics of Random Systems will contribute to
- sharpening the UK's research lead in this area and
- meeting the needs of industry across the technology, finance, government and healthcare sectors

MATHEMATICS, THEORETICAL PHYSICS and MATHEMATICAL BIOLOGY

The explosion of novel research areas in stochastic analysis requires the training of young researchers capable of facing the new scientific challenges and maintaining the UK's lead in this area. The partners are at the forefront of many recent developments and ideally positioned to successfully train the next generation of UK scientists for tackling these exciting challenges.
The theory of regularity structures, pioneered by Hairer (Imperial), has generated a ground-breaking approach to singular stochastic partial differential equations (SPDEs) and opened the way to solve longstanding problems in physics of random interface growth and quantum field theory, spearheaded by Hairer's group at Imperial. The theory of rough paths, initiated by TJ Lyons (Oxford), is undergoing a renewal spurred by applications in Data Science and systems control, led by the Oxford group in conjunction with Cass (Imperial). Pathwise methods and infinite dimensional methods in stochastic analysis with applications to robust modelling in finance and control have been developed by both groups.
Applications of probabilistic modelling in population genetics, mathematical ecology and precision healthcare, are active areas in which our groups have recognized expertise.

FINANCIAL SERVICES and GOVERNMENT

The large-scale computerisation of financial markets and retail finance and the advent of massive financial data sets are radically changing the landscape of financial services, requiring new profiles of experts with strong analytical and computing skills as well as familiarity with Big Data analysis and data-driven modelling, not matched by current MSc and PhD programs. Financial regulators (Bank of England, FCA, ECB) are investing in analytics and modelling to face this challenge. We will develop a novel training and research agenda adapted to these needs by leveraging the considerable expertise of our teams in quantitative modelling in finance and our extensive experience in partnerships with the financial institutions and regulators.

DATA SCIENCE:

Probabilistic algorithms, such as Stochastic gradient descent and Monte Carlo Tree Search, underlie the impressive achievements of Deep Learning methods. Stochastic control provides the theoretical framework for understanding and designing Reinforcement Learning algorithms. Deeper understanding of these algorithms can pave the way to designing improved algorithms with higher predictability and 'explainable' results, crucial for applications.
We will train experts who can blend a deeper understanding of algorithms with knowledge of the application at hand to go beyond pure data analysis and develop data-driven models and decision aid tools
There is a high demand for such expertise in technology, healthcare and finance sectors and great enthusiasm from our industry partners. Knowledge transfer will be enhanced through internships, co-funded studentships and paths to entrepreneurs

Student:

Joseph Mulligan

Period of Study:

Oct 21 - Dec 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2602131

Research Topic:

Unclassified

Organisations

Imperial College London (Lead Research Organisation)

People	ORCID iD
Antoine Jacquier (Primary Supervisor)	http://orcid.org/0000-0003-3986-3201
Joseph Mulligan (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S023925/1			01/04/2019	30/09/2027
2602131	Studentship	EP/S023925/1	01/10/2021	30/12/2025	Joseph Mulligan