Methodological Advancements on the use of Administrative Data in Official Statistics

Lead Research Organisation: University of Manchester
Department Name: Social Sciences

Abstract

National Statistical Institutes (NSIs) are directing resources into advancing the use of administrative data in official statistics systems. This is a top priority for the UK Office for National Statistics (ONS) as they are undergoing transformations in their statistical systems to make more use of administrative data for future censuses and population statistics. Administrative data are defined as secondary data sources since they are produced by other agencies as a result of an event or a transaction relating to administrative procedures of organisations, public administrations and government agencies. Nevertheless, they have the potential to become important data sources for the production of official statistics by significantly reducing the cost and burden of response and improving the efficiency of such systems. Embedding administrative data in statistical systems is not without costs and it is vital to understand where potential errors may arise. The Total Administrative Data Error Framework sets out all possible sources of error when using administrative data as statistical data, depending on whether it is a single data source or integrated with other data sources such as survey data. For a single administrative data, one of the main sources of error is coverage and representation to the target population of interest. This is particularly relevant when administrative data is delivered over time, such as tax data for maintaining the Business Register. For sub-project 1 of this research project, we develop quality indicators that allow the statistical agency to assess if the administrative data is representative to the target population and which sub-groups may be missing or over-covered. This is essential for producing unbiased estimates from administrative data. Another priority at statistical agencies is to produce a statistical register for population characteristic estimates, such as employment statistics, from multiple sources of administrative and survey data. Using administrative data to build a spine, survey data can be integrated using record linkage and statistical matching approaches on a set of common matching variables. This will be the topic for sub-project 2, which will be split into several topics of research. The first topic is whether adding statistical predictions and correlation structures improves the linkage and data integration. The second topic is to research a mass imputation framework for imputing missing target variables in the statistical register where the missing data may be due to multiple underlying mechanisms. Therefore, the third topic will aim to improve the mass imputation framework to mitigate against possible measurement errors, for example by adding benchmarks and other constraints into the approaches. On completion of a statistical register, estimates for key target variables at local areas can easily be aggregated. However, it is essential to also measure the precision of these estimates through mean square errors and this will be the fourth topic of the sub-project. Finally, this new way of producing official statistics is compared to the more common method of incorporating administrative data through survey weights and model-based estimation approaches. In other words, we evaluate whether it is better 'to weight' or 'to impute' for population characteristic estimates - a key question under investigation by survey statisticians in the last decade.

Publications

10 25 50
 
Description We have developed software and a user manual to produce quality indicators to assess the representativeness of administrative data. We will be producing an academic journal paper from this work in the near future. We are also working on a second stage of the research grant on integrating administrative data with survey data to produce multi-source estimates.
Exploitation Route The outcomes will be used by the Office for National Statistics and other National Statistical Institutes to assess the quality of administrative data through probability-based survey data.
Sectors Government, Democracy and Justice

URL https://github.com/sook-tusk/qualadmin
 
Description The user manual and R package are currently being used by the Office for National Statistics to assess the quality of administrative data.
First Year Of Impact 2023
Sector Government, Democracy and Justice
Impact Types Policy & public services

 
Title Quality Indicators for Administrative Data User Manual For R Package on GitHub 
Description R Package and User Manual for Quality Indicators for Administrative Data on GitHub. Authors: Myong Sook Kim, Natalie Shlomo 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact The Office for National Statistics is using the code to assess their administrative data and we are working closely with them as partners on the grant. 
URL https://github.com/sook-tusk/qualadmin