Model Selection for High-Dimensional Temporal Disaggregation in Official Statistics

Lead Research Organisation: Lancaster University
Department Name: Mathematics and Statistics

Abstract

Traditional methods for producing economics statistics, for instance GDP, rely on data gathered through surveys of a population. Whilst such methods are accurate, and well calibrated, they are very expensive to run, and take a long time to feed-back information. As such, National Statistics Institutes such as the UK's Office for National Statistics (ONS) are looking to integrate so-called administrative data, and alternative data-streams such as web-scrapped data into their estimation of economic statistics. Using such data can potentially increase both the frequency and the accuracy at which economic statistics are produced. However, it is often unclear how these alternative data-sources (of which there can be many) relate to the traditional survey results, and how we can produce high-frequency series which are consistent with the survey data.

Given that we could measure many different aspects of the population, only a few of these might actually be relevant to producing a particular statistic of interest. From a methodological viewpoint, this mandates that we choose between several competing statistical models, a problem known as model selection. Traditional model selection methods assume that the number of data-points is much larger than the number of data-streams, however, when linking administrative, and alternative data-sources, that assumption will no longer hold and one has to consider the so-called high-dimensional statistical setting. This project proposes to adapt recent advances in high-dimensional methodology to the analysis and production of bench-marked economic statistics. The project aims to examine both the empirical behaviour of these methods via simulation, and work with practitioners at the ONS to implement and test these methods through the development of a easy to use software package.

Publications

10 25 50
publication icon
Mosley L (2022) Sparse Temporal Disaggregation in Journal of the Royal Statistical Society Series A: Statistics in Society

 
Description The project investigated theoretically and empirically how temporally disaggregate measures of the economy (i.e. at a higher frequency than natively measured) could be estimated based on using indicator series that are thought to follow similar movements in behaviour. We proposed two methods for achieving this task, the first to automatically select between a set of feasible indicators, and a second, that extracts key factors from the indicators, and then uses these factors to describe the behaviour of the output of interest. An example application that has been studied throughout the award was to produce a set of monthly estimates for "trade-in-service" statistics for the UK, this is of significant interest, as the surveys that are traditionally used to construct these statistics are only conducted on a quarterly basis, and thus limit the level of temporal granularity available to users and policy makers. We have successfully applied the two methods to this application, with estimated series being qualitatively similar to those produced by alternative methods within the ONS. The methods have been made widely available via open-source software packages.

A summary of outcomes and impacts is given below:

- The methodology and results have been presented to and discussed with the ONS and at an OECD Time-series working group in Paris. Further evaluation of the methods are being performed by statisticians working at the Italian Institute of Statistics (ISTAT).
- The paper "Sparse Temporal Disaggregation" illustrates the behaviour of one of our methods in experimental settings, and gives an example applied to GDP disaggregation. This also provides some guidance on tuning the methods to implement them in practice.
- A further paper (yet to be published) will illustrate the second approach to modelling using extracted factors, and demonstrate how it can be used to disaggregate Trade-in-Services data.
- Two software packages have been created DisaggregateTS and sparseDFM to allow users to implement these measures easily from the R programming language. Accompanying documentation has also been produced with case studies on inflation nowcasting.
- We initiated theoretical investigations on mathematical properties of estimators related to our disaggregation techniques, but which may be of more general interest to statisticians and econometricians. For instance, the results could be important in understanding spatial statistics methodologies, or in other areas of time-series analysis where users wish to model very many series simultaneously.
Exploitation Route The research generated via this grant can be applied in many settings, from understanding energy consumption, to environmental processes, as well as the initial applications in economic statistics. The software packages developed as part of this grant are available as open-source code, and can be easily extended if required. There are several directions for immediate future work, some of which will be followed up in subsequent (funded) research projects. The disaggregate series produced by these methods may themselves find utility for a range of stakeholders in policy and government, especially those who regularly handle (and make decisions) based on low-frequency time-series data.
Sectors Communities and Social Services/Policy,Energy,Financial Services, and Management Consultancy

 
Title DisaggregateTS: High-Dimensional Temporal Disaggregation 
Description An R package that includes a toolkit for temporal disaggregation and benchmarking techniques, applies methods developed in Dagum and Cholette (2006, ISBN:978-0-387-35439-2) ; and novel high-dimensional techniques proposed by Mosley, Gibberd and Eckley (2021) 
Type Of Material Data analysis technique 
Year Produced 2022 
Provided To Others? Yes  
Impact n/a 
URL https://cran.r-project.org/web/packages/DisaggregateTS/index.html