NOVEL APPROACHES TO COMPARING THE PREDICTIVE ACCURACY OF NESTED MODELS IN DATA RICH AND HETEROGENEOUS PREDICTOR ENVIRONMENTS
Lead Research Organisation:
University of Southampton
Department Name: Sch of Economic, Social & Political Sci
Abstract
Comparing the out of sample predictive accuracy of competing statistical models is an essential component of data science and a key metric for choosing a suitable specification for the purpose of generating forecasts or discriminating between competing hypotheses. Unlike the explanatory power of such models which is commonly evaluated via in-sample goodness of fit measures and specification tests, predictive accuracy and predictive modelling are instead concerned with how well models can cope with unseen data and produce accurate forecasts of some outcome of interest.
The purpose of this project is to develop a novel toolkit for comparing the relative accuracy of time series forecasts produced by two or more nested predictive regression models with the end-goal of detecting key drivers of predictability or the lack of it. We consider an environment where one is confronted with not only a potentially large pool of predictors but also with these predictors allowed to display a mixture of dynamic characteristics, some (or all) being highly persistent and others noisier as it commonly occurs in economic and financial data. A macroeconomist interested in forecasts of GDP growth for instance faces hundreds of potentially useful predictors ranging from noisy indicators with very little memory such as financial returns to more persistent series with much longer memory or trending behaviours such as interest rates. Bundling such predictors together in a predictive accuracy contest or ignoring the persistence properties of the data all-together is likely to affect the reliability of inferences regardless of whether there are few or many such predictors. Despite the relevance and omnipresence of such scenarios in applied work the predictive accuracy testing literature has devoted little attention to such considerations. The novel aspects of this research concern both the specific criteria introduced for implementing predictive accuracy comparisons which will considerably simplify and generalise existing approaches and the richer environment under which they can be applied.
Furthermore and in the course of empirical research or policy analysis, researchers are often faced with the need to compare the forecasting ability of a simple model with a more complicated one, with the simple model being a special case of the more complicated model. Such model pairs are typically referred to as nested while model pairs with no such similarities are referred to as non-nested. Nested models are one of the most commonly encountered setting in empirical research and help answer fundamental questions such as: does the inclusion of a set of additional predictors significantly improve the predictive power of a smaller model or a non-predictability benchmark?
Irrespective of whether one operates in a big data environment combined with heterogeneous predictor types or in a more idealised environment with few well behaved and purely stationary predictors, conducting out of sample predictive accuracy comparisons between nested models raises many technical challenges that have also not been resolved in a satisfactory way despite a voluminous literature on the subject (e.g. the fact that two nested models collapse into the same specification under the hypothesis of equal predictive accuracy typically results in ill-defined test statistics with degenerate variances).
The overarching objective of this proposal is to introduce a totally new technical framework that can accommodate predictive accuracy comparisons between models irrespective of whether they have a nested structure or not. This framework will then be used to develop a toolkit for conducting predictive accuracy tests and predictor screening in data rich environments.
The purpose of this project is to develop a novel toolkit for comparing the relative accuracy of time series forecasts produced by two or more nested predictive regression models with the end-goal of detecting key drivers of predictability or the lack of it. We consider an environment where one is confronted with not only a potentially large pool of predictors but also with these predictors allowed to display a mixture of dynamic characteristics, some (or all) being highly persistent and others noisier as it commonly occurs in economic and financial data. A macroeconomist interested in forecasts of GDP growth for instance faces hundreds of potentially useful predictors ranging from noisy indicators with very little memory such as financial returns to more persistent series with much longer memory or trending behaviours such as interest rates. Bundling such predictors together in a predictive accuracy contest or ignoring the persistence properties of the data all-together is likely to affect the reliability of inferences regardless of whether there are few or many such predictors. Despite the relevance and omnipresence of such scenarios in applied work the predictive accuracy testing literature has devoted little attention to such considerations. The novel aspects of this research concern both the specific criteria introduced for implementing predictive accuracy comparisons which will considerably simplify and generalise existing approaches and the richer environment under which they can be applied.
Furthermore and in the course of empirical research or policy analysis, researchers are often faced with the need to compare the forecasting ability of a simple model with a more complicated one, with the simple model being a special case of the more complicated model. Such model pairs are typically referred to as nested while model pairs with no such similarities are referred to as non-nested. Nested models are one of the most commonly encountered setting in empirical research and help answer fundamental questions such as: does the inclusion of a set of additional predictors significantly improve the predictive power of a smaller model or a non-predictability benchmark?
Irrespective of whether one operates in a big data environment combined with heterogeneous predictor types or in a more idealised environment with few well behaved and purely stationary predictors, conducting out of sample predictive accuracy comparisons between nested models raises many technical challenges that have also not been resolved in a satisfactory way despite a voluminous literature on the subject (e.g. the fact that two nested models collapse into the same specification under the hypothesis of equal predictive accuracy typically results in ill-defined test statistics with degenerate variances).
The overarching objective of this proposal is to introduce a totally new technical framework that can accommodate predictive accuracy comparisons between models irrespective of whether they have a nested structure or not. This framework will then be used to develop a toolkit for conducting predictive accuracy tests and predictor screening in data rich environments.
Publications
Gonzalo J
(2021)
Spurious relationships in high-dimensional systems with strong or mild persistence
in International Journal of Forecasting
Gonzalo J
(2023)
Out-of-sample predictability in predictive regressions with many predictor candidates
in International Journal of Forecasting
Gonzalo Jesus
(2023)
Out of Sample Predictability in Predictive Regressions with Many Predictor Candidates
in arXiv e-prints
Pitarakis J
(2023)
A NOVEL APPROACH TO PREDICTIVE ACCURACY TESTING IN NESTED ENVIRONMENTS
in Econometric Theory
Title | Novel techniques for comparing the predictive accuracy of competing econometric models |
Description | A key component of the grant's research agenda is the development of new techniques for comparing the predictive accuracy of competing econometric models. A long standing and unresolved problem in this literature has to do with our ability to conduct formal tests of predictive accuracy when the models under consideration are nested. Suppose for instance that a particular model aims to explain an outcome of interest with a certain number of predictors. We wish to assess whether a larger model that contains the same predictors augmented by an additional set improves or deteriorates the model's ability to generate out of sample predictions (i.e., using new data). How can we formally test such hypotheses? In the working paper titled "A novel approach to predictive accuracy testing in nested environments" I introduce a novel test statistic designed to implement such inferences. The test does not suffer from the drawbacks and limitations of existing methods. Its distribution is derived formally, and implementation guidelines provided for practitioners. In the second working paper titled "Out of sample predictability in Predictive Regressions with many predictor candidates" I extend the proposed techniques to a high dimensional environment that can accommodate many predictors. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Matlab programmes implementing the proposed techniques have been made publicly available. |
URL | https://sites.google.com/view/jpitarakis/working-papers |
Title | Method of comparing the predictive accuracy of competing econometric models |
Description | A key component of the grant's research agenda is the development of new techniques for comparing the predictive accuracy of competing econometric models. A long standing and unresolved problem in this literature has to do with our ability to conduct formal tests of predictive accuracy when the models under consideration are nested. Suppose for instance that a particular model aims to explain an outcome of interest with a certain number of predictors. We wish to assess whether a larger model that contains the same predictors augmented by an additional set improves or deteriorates the model's ability to generate out of sample predictions (i.e., using new data). How can we formally test such hypotheses? In the working paper titled "A novel approach to predictive accuracy testing in nested environments" I introduce a novel test statistic designed to implement such inferences. The test does not suffer from the drawbacks and limitations of existing methods. Its distribution is derived formally, and implementation guidelines provided for practitioners. In the second working paper titled "Out of sample predictability in Predictive Regressions with many predictor candidates" I extend the proposed techniques to a high dimensional environment that can accommodate many predictors. |
Type Of Material | Data analysis technique |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Matlab code (software) for implementing the proposed techniques |
URL | https://sites.google.com/view/jpitarakis/working-papers |
Description | International collaboration with researchers from Universidad Carlos III de Madrid, Spain |
Organisation | Charles III University of Madrid |
Country | Spain |
Sector | Academic/University |
PI Contribution | Co-authorship of a research paper |
Collaborator Contribution | Co-authorship of a research paper. |
Impact | "Out of sample predictability in Predictive Regressions with many predictor candidates" (with Jesus Gonzalo). DOI: https://doi.org/10.48550/arXiv.2302.02866 |
Start Year | 2022 |
Description | Dedicated grant website aggregating all forms of outputs and disseminations |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Launch of website associated with grant activities and outputs (in particular: software, working-papers/publications, listing of events where research has been presented or will be presented) |
Year(s) Of Engagement Activity | 2022 |
URL | https://sites.google.com/view/jpitarakis-esrc/home |