Natural Language Processing for Financial Market Modelling and Forecasting

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

Augmenting topic-sentiment models for financial forecasting In the recent past, language analysis in finance has been approached from different research directions. One important dimension of language - sentiment - as analysed by Antweiler and Frank (2004), Hu and Liu (2004), Bollen (2011), Si et al. (2013), Levenberg et al. (2014), focusses on the mood and emotion conveyed in text data sources such as online stock message boards, social media posts or financial news. The other major linguistic dimension - the conveyed story or narrative - can for example be approximated by estimating probabilistic topic models such as Latent Dirichlet Allocation (LDA), introduced by Blei et al. (2003). However, focusing only on either one of these language dimensions can leave relevant linguistic information unused. Recently, more holistic modelling approaches have attempted to model the full dimensionality of language for financial forecasting by combining sentiment and topic modelling (Nguyen and Shirai, 2015). While the authors measure an increased forecast performance of topic-sentiment models on financial market related indicators, I believe natural language processing for financial forecasting can be further adjusted to better match the actual time-series characteristics of financial and economic data. For instance, Latent Dirichlet Allocation assumes both, that topics do not change over time and that topics are
uncorrelated. These are assumptions that might turn out to be too strong when analysing textual time series data in finance. I would be interested to adjust such sentiment-topic models with features that allow for topic-correlation (Blei and Lafferty, 2006a) or topic evolution (Blei and Lafferty, 2006b). Another potential model limitation in Nguyen and Shirai (2015) is its assumption of an exogenously determined number of topics. Teh et al. (2005) developed a hierarchical dirichlet process, which endogenises this parameter. It would be interesting to test whether such (a combination of) specifications yield better financial time-series forecasting performances.2. Application of topic-sentiment analysis to forecast monetary policy decisions In financial and economic theory, fluctuations of markets are often explained by the occurrences of exogenous shocks to the economy or financial system. One class of such shocks - namely monetary
policy shocks - represents central bank decisions about changing the target interest rate, which cannot be explained by contemporary and forecasted values of macroeconomic variables relevant for monetary policy decision making. I follow the methodology brought forward by Romer and Romer (2004) to estimate such monetary policy shocks. That is, I first regress monetary policy decisions on contemporary and forecasted macroeconomic data of inflation, real GDP growth, and unemployment. The residuals of such a regression represent movements in monetary policy that cannot be explained by quantitative economic data underlying conventional monetary policy. I then assess whether narrative effects carry explanatory power to predict these monetary policy shocks (the regression residuals). I utilize topic-sentiment models (as described earlier in my proposal) to identify whether changes in a) the narrative of central bank internal reports and b) newspaper articles on political, business, financial and economic events carry predictive power to explain these monetary policy shocks. Focusing on the timespan of 2000-2011, I programme machine learning procedures in python to estimate probabilistic topic models and topic's sentiment scores spanning a dataset of over 500,000 articles of leading US newspapers as well as over 200 central bank reports. The central bank internal reports are being created for each regularly held FOMC2 meeting of the US Federal Reserve board members. In each of these FOMC meetings, the members decide about the target interest rate.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000649/1 01/10/2017 30/09/2027
2094258 Studentship ES/P000649/1 01/10/2018 15/04/2022 Maximilian Ahrens