Inference on Spatial-Temporal Econometric Models

Lead Research Organisation: London School of Economics and Political Science
Department Name: Economics

Abstract

Much economic data are recorded over both space and time. Frequently they exhibit correlation, as well as trending behaviour. A suitable econometric model enables analysis and estimation of such phenomena. In particular, correcting for spatial and temporal correlation can lead to improved estimation of parameters of economic interest and tests of economic hypotheses, with implications for economic policy. We make use of possible knowledge of spatial locations or of economic distances between locations, or estimates of such distances, or else of a large number of temporal observations, to estimate spatial correlation, as well as employing knowledge of regularity or irregularity of temporal observations to estimate time series correlation. A broad and flexible model class will be considered, allowing for a continuum of strong to weak correlation, as well as general forms of trending behaviour, and model components whose functional form is unknown, with a view to reducing risk of misspecification. Methods of estimation and testing that are relatively reliable, and precise, will be proposed, and theory developed that rigorously justifies decision-making rules. This theory will be supplemented by simulation studies and applications to empirical data, and software embodying the econometric procedures will be written and made freely available to practitioners.

Planned Impact

The results will impact upon the study of economic and financial issues which are informed by empirical data that are observed over space (meaning some relevant economic space, not necessarily geographic) and time. We cover a wide range of circumstances, where the number of spatial observations or the number of temporal observations are large, or both are large, where various spatial configurations are possible, and while most of the stress is on regular time intervals, attention will also be given to irregularity which can vary across the spatial units. The modelling will be quite general and flexible, and the stress will be on econometric methods that make efficient use of the data and thereby aim to produce relatively informative findings, that are also relatively easy to use by practitioners. As a result of all these features, the potential impacts are very large. The research will be made available in research papers submitted for publication in leading international journals, and prior to publication posted on the Principal Investigator's website and released on the CEMMAP Working Paper Series, the STICERD Econometrics Discussion Paper Series or the Info-Metrics Institute Working Paper Series. It will also be presented at conferences (attendance at only some of which is support from ESRC requested) and seminars. In addition, benefited by the relatively unified nature of the modelling adopted in the research, software implementing the methods will be written and made freely available to practitioners, in order to increase impact by facilitating use of the research.

Publications

10 25 50
publication icon
Robinson P (2012) Inference on power law spatial trends in Bernoulli

publication icon
Robinson P (2012) Nonparametric trending regression with cross-sectional dependence in Journal of Econometrics

publication icon
Robinson P (2012) Statistical inference on regression with spatial dependence in Journal of Econometrics

publication icon
Robinson P (2014) REFINED TESTS FOR SPATIAL CORRELATION in Econometric Theory

publication icon
Gao J (2014) INFERENCE ON NONSTATIONARY TIME SERIES WITH MOVING MEAN in Econometric Theory

publication icon
Robinson P (2014) The estimation of misspecified long memory models in Journal of Econometrics

publication icon
Lee J (2015) Panel nonparametric regression with fixed effects in Journal of Econometrics

publication icon
Delgado M (2015) Non-nested testing of spatial correlation in Journal of Econometrics

 
Description The research has developed methods and theory for drawing conclusions from spatial and spatial-temporal statistical data, principally economic data, observed over space and/or time. Here, 'space' can refer to geographical or economic space. In the former case, exact or approximate locations, generally irregularly-spaced, may be known, but even if they are known geographic distances between locations may be less relevant than 'economic distances', whose construction may depend more or less on economic theory. 'Time' usually, but not always, refers to regularly spaced temporal observations. An important difference between time and space is that there is a natural (one-dimensional) ordering only for the former. Spatial and temporal locations or distances convey important information, for example often one might expect observations at nearby locations to be more strongly correlated than distant ones, and generally statistical models that embody the information on locations and distances will lead to more reliable conclusions than ones that do not. However, some of the research is relevant to data for which no spatial information is relevant or available, as with some cross-sectional data, or some panel data, where in the latter case there is a sequence of cross-sectional data sets at a series of known time points; the class of panel data models overlaps with the class of spatial-temporal models.

An important class of statistical models in spatial econometrics, and also geography, are 'spatial autoregressive' (SAR) models. These employ measures of distance, and one or more unknown parameters, to describe correlation between observations, sometimes also expressing the influence of explanatory variables. Various methods of estimating the parameters in such models have been developed, as have their statistical properties (in particular their approximate distributional behaviour, useful in testing hypotheses and providing measures of variability). As is usually the case, exact statistical properties are intractable, even under very strong assumptions, and instead theory which becomes more accurate as sample size increases has been developed. However, this can be inaccurate when applied to data sets consisting of a small or moderate number of observations. To improve on methodology in such circumstances, the project has developed closer approximations to distributional behaviour, which turn out to entail more or less simple adjustments to existing methods. One direction has involved the use of least squares estimates of unknown parameters. Typically, in SAR models without explanatory variables these work well only when there is no spatial correlation, but the research has investigated their use in testing the hypothesis of no spatial correlation, an important initial issue in much data analysis. Surprisingly, they also perform better than expected in the presence of modest spatial correlation. Another class of tests of no spatial correlation investigated, in a somewhat more general setting, does not require parameter estimation, and this approach can be desirable from a computational point of view. Instead, the tests are based on the behaviour of the likelihood (joint distribution) of the data. Again, refinements to existing methods, affording greater accuracy in smallish data set, have been developed in the project. On the other hand the project has also developed improvements in tests based on maximum likelihood estimates, which are only implicitly defined and entail greater computational effort than the other two approaches, but may be expected to perform better statistically . Here, an extension to a panel data form of spatial-temporal data was developed.

In smallish sample sizes it is generally feasible to employ models with only a small number of unknown parameters. On the other hand, since there is often some uncertainty in model specification, due to limited relevant economic theory, there is often a tendency to employ richer models, involving many parameters, for larger data sets. Normally statistical theory nevertheless treats the number of parameters as staying fixed as sample size increases, but it may reflect reality better to regard the number of parameters as increasing with sample size, but at a slower rate. This was accomplished in the project in the context of 'higher-order' SAR models, involving several dependence parameters and also explanatory variables, whose number may also be regarded as increasing slowly with sample size. Theory for least squares and instrumental variables estimates was developed. In ongoing work, theory is being developed for maximum likelihood estimates, allowing also for possible nonlinearity in explanatory variables.

Important aspects of statistical models are robustness and efficiency (both in a large sample sense). There is a lack or robustness if validity of methods requires very precise assumptions. On the other hand efficiency entails precise estimation, which is relative to a particular distributional form. 'Adaptive' estimation allows the data to speak for themselves by affording efficient estimation in the presence of a wide class of distributional behaviour of the data. Maximum likelihood estimates under an assumed distribution will be efficient if the assumption is correct but inefficient if it is misspecified. By contrast the adaptive estimates developed in the project for SAR models without explanatory variables will be efficient much more generally.

Adaptive estimation was also pursued in a panel data or multivariate time series setting, employing ideas from 'independent components analysis' (ICA). This entails techniques for analysing multivariate data, representing them as a linear transformation of random variables that are independent but have distributions that are unknown and not necessarily the same. Previous ICA work on time series has represented the observed series in terms of independent components. By contrast the project research represented the unobserved multivariate disturbances or innovations in spatial-temporal models in terms of independent components. This enables adaptive estimation without the serious 'curse of dimensionality' problem that would be posed by an unrestricted distribution for the multivariate disturbances, and also has computational advantages.

In the models described above, spatial and/ or temporal dependence is 'weak' in a mathematical sense. This mirrors much existing research on spatial econometrics and statistics, and also on time series analysis. A good deal of methods and theory have, however, been developed for time series exhibiting 'strong' dependence, or 'long memory', but much less for spatial data, which pose extra challenges. The project has studied strong dependence, and also 'negative' dependence, in the setting of spatial and spatial-temporal data recorded at regular intervals. However, the sampling region was realistically allowed to be of irregular shape. For the sample mean, distribution theory, with the number of observations increasing, was established in the three different cases of weak, strong and negative dependence. These results are capable of extension to more general model settings.

Strong dependence was also a theme of project research on panel data. Nonstationarity over time is an important theme of the economic panel data literature. It has been predominately studied in an autoregressive setting, where the model implies either stability combined with weak dependence or unit root nonstationarity as a form of strong dependence, and leads to nonstandard large sample behaviour, where classical issues of efficiency cannot be discussed. As a different approach, the research employed fractional models, which can cover both strong dependent stable and nonstationary behaviour (including a unit root), and desirably lead to standard large sample distributional behaviour and efficiency of estimation, and also efficient tests. Different approaches were used to deal with the incidental parameters problem due to the presence of individual effects, with bias correction on two of these leading to improvements. Due to the desirability of focussing on the particular challenge of dealing with individual effects in a fractional setting, the model treated was otherwise relatively simple, but is capable of extension in several useful directions.

Many possible models for spatial and spatial-temporal data have been developed. The SAR acronym can apply to a wide class of models involving spatial distances, and the practitioner is faced at the outset by the problem of choosing one of these for her data. Likewise, the spatial statistics literature has proposed a number of models that can be used when spatial locations are known, some of which extend time series models, and again a choice must be made. Indeed, because knowledge of locations implies knowledge of distances one has access also to SAR models here. The project has developed 'non-nested' tests to choose between rival spatial or spatial-temporal models. In order to cover the many models just referred to, and also panel and multivariate data settings, the tests were theoretically justified under very general conditions, which need to be checked in specific cases. It would appear that the tests are relevant to strong dependent, as well as weak dependent, data.

Some of the research has covered models for cross-sectional or panel data where even if observations are recorded over space there is no knowledge of spatial locations or distances that can be used. If cross-sectional dependence is nevertheless suspected it must typically be accounted for. Factor models, representing the dependence by means of a relatively small number of unobserved components, have proved a popular approach , but require assumptions that can be unwarranted. Instead, the project has pursued a more robust approach, with a nonparametric modelling of cross-sectional dependence, in two settings.

One of these centres on nonparametric and semiparametric regression models. Parametric models, such as linear regression, can be liable to misspecification, leading to invalid conclusions. A nonparametric approach avoids specifying a functional form, and thus provides greater flexibility, Nonparametric modelling requires a large data set in order to yield reasonably precise conclusions, and though very large cross-sectional data sets, for example on households and firms, and long financial time series, are nowadays available, in data sets of more moderate size, semiparametric models, combining both a parametric and a nonparametric component, can be more realistic, and have also proved very popular on econometrics. In particular, the parameters in a semiparametric model can often be estimated with comparable accuracy to that achievable in a correctly specified parametric model. The project has developed methods and theory for nonparametric and semiparametric regression estimates in the presence of quite general, nonparametric, cross-sectional dependence and heterogeneity in both explanatory variables and disturbances. The unknown cross-sectional dependence makes it difficult to apply the methods in hypothesis testing and to obtain estates of variability of parameter estimates in practice, but for a particular setting this obstacle was surmounted.

The other problem involving cross-sectional dependence concerned nonparametric regression for panel data, with individual effects. The main goal here was to employ the possible cross-sectional dependence and heterogeneity in disturbances to achieve nonparametric regression estimates that are more precise than ones that ignore these features. In practice this requires estimating the cross-sectional disturbance covariance matrix, and to do so in a fully nonparametric way necessitates the time series to be long relative to the number of cross-sectional observations. Some other results with useful implications for practical implementation were also developed.

Some of the research focussed on purely time series data, with a view to future spatial-temporal extensions. Three topics were pursued here, each with a concern for strong dependence. In semiparametric long memory models only long run behaviour is modelled, but even this is subject to possible misspecification. In the project the implications of this were investigated with respect to an important class of estimates. In models for multivariate nonstationary data, an important econometric topic has been cointegration, which entails existence of a linear combination of the series which is stable, or at least less nonstationary, than the original data. The project has developed new methods for investigating the extent and nature of cointegration in a quite general setting. Finally, models for nonstationary data involving both a nonparametric (rather than, say, linear) time trend, and strong dependence were investigated. Estimates of both the time trend and the parameters characterising the dependence were proposed and theoretically justified.

Altogether, the research has pursued a number of more or less related themes, in the matter of drawing conclusions from a variety of kinds of data, and developed useful new methodology justified by rigorous theory, which has surmounted various challenges.
Exploitation Route The methodology can be applied and modified and extended in various directions by practitioners.

The theoretical results can be modified and extended in various directions by other.
Sectors Agriculture, Food and Drink,Creative Economy,Environment,Other

URL http://personal.lse.ac.uk/robinso1/
 
Description Dr. Muhammad Shafiullah, Senior Economist, Policy Research Institute of Bangladesh, has requested software code that was used in one of the papers produced under the project.
First Year Of Impact 2016
Sector Government, Democracy and Justice
Impact Types Economic

 
Title Spatial temporal econometrics 
Description Econometric techniques for analysing econometric data 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2015 
Impact Some of it has been requested.