Methodological Advancements on the use of Administrative Data in Official Statistics

Lead Research Organisation: University of Manchester
Department Name: Social Sciences

Abstract

National Statistical Institutes (NSIs) are directing resources into advancing the use of administrative data in official statistics systems. This is a top priority for the UK Office for National Statistics (ONS) as they are undergoing transformations in their statistical systems to make more use of administrative data for future censuses and population statistics. Administrative data are defined as secondary data sources since they are produced by other agencies as a result of an event or a transaction relating to administrative procedures of organisations, public administrations and government agencies. Nevertheless, they have the potential to become important data sources for the production of official statistics by significantly reducing the cost and burden of response and improving the efficiency of such systems. Embedding administrative data in statistical systems is not without costs and it is vital to understand where potential errors may arise. The Total Administrative Data Error Framework sets out all possible sources of error when using administrative data as statistical data, depending on whether it is a single data source or integrated with other data sources such as survey data. For a single administrative data, one of the main sources of error is coverage and representation to the target population of interest. This is particularly relevant when administrative data is delivered over time, such as tax data for maintaining the Business Register. For sub-project 1 of this research project, we develop quality indicators that allow the statistical agency to assess if the administrative data is representative to the target population and which sub-groups may be missing or over-covered. This is essential for producing unbiased estimates from administrative data. Another priority at statistical agencies is to produce a statistical register for population characteristic estimates, such as employment statistics, from multiple sources of administrative and survey data. Using administrative data to build a spine, survey data can be integrated using record linkage and statistical matching approaches on a set of common matching variables. This will be the topic for sub-project 2, which will be split into several topics of research. The first topic is whether adding statistical predictions and correlation structures improves the linkage and data integration. The second topic is to research a mass imputation framework for imputing missing target variables in the statistical register where the missing data may be due to multiple underlying mechanisms. Therefore, the third topic will aim to improve the mass imputation framework to mitigate against possible measurement errors, for example by adding benchmarks and other constraints into the approaches. On completion of a statistical register, estimates for key target variables at local areas can easily be aggregated. However, it is essential to also measure the precision of these estimates through mean square errors and this will be the fourth topic of the sub-project. Finally, this new way of producing official statistics is compared to the more common method of incorporating administrative data through survey weights and model-based estimation approaches. In other words, we evaluate whether it is better 'to weight' or 'to impute' for population characteristic estimates - a key question under investigation by survey statisticians in the last decade.

Publications

10 25 50
 
Description We have developed software and a user manual to produce quality indicators to assess the representativeness of administrative data. The original R-code was updated in October 2023 with improvements made to the automation of graphics and summary statistics spreadsheet. We also provided a further update in December 2023 where we developed additional R-code to allow for the calculation of quality indicators from a frequency table rather than from the microdata. This allows for quality indicators for very large administrative datasets held at the ONS which can have over 60 million records.

We submitted an academic journal paper from this work to the Journal of Official Statistics in September 2023. We are continuing to work on a second stage of the research grant on integrating administrative data with survey data to produce multi-source estimates.
Exploitation Route The outcomes are used by the Office for National Statistics and other National Statistical Institutes to assess the quality of administrative data through probability-based survey data (or census data).

The ONS tested and produced a report for an administrative dataset for one selected local authority - the admin-based housing by ethnicity dataset (ABHED). The ONS used Census 2021 for England and Wales as the comparative, population auxiliary data. The application focused on four variables found within both datasets: sex, age group, accommodation type and ethnicity. The full report can be found here:
Office for National Statistics (ONS), released 8 December 2023, ONS website, methodology, Quality indicators for representativeness in administrative data: R-indicators and distance metrics. https://www.ons.gov.uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/qualityindicatorsforrepresentativenessinadministrativedatarindicatorsanddistancemetrics
Sectors Government

Democracy and Justice

URL https://github.com/sook-tusk/qualadmin
 
Description The research on the grant: ES/V005456/1 Methodological Advancements on the use of Administrative Data in Official Statistics, related to quality indicators for administrative data sources, has the potential to lead to considerable impact beyond the academic community. With the uploading and revising of the R-code and manual titled 'Quality Indicators for Administrative Data User Manual for R Package on GitHub (https://github.com/sook-tusk/qualadmin) in January 2023, the Office for National Statistics (ONS) began testing the code to compute quality indicators for real administrative data sources. This has led to regular discussions and improvements in the R package to meet the needs of the ONS. The code was revised in October 2023 with improvements made to the automation of graphics and summary statistics spreadsheet. We also provided a further update in December 2023 where we developed additional R-code to allow for the calculation of quality indicators from a frequency table rather than from the microdata. This allows for quality indicators for very large administrative datasets held at the ONS which can have over 60 million records. The ONS tested and produced a report for an administrative dataset for one selected local authority - the admin-based housing by ethnicity dataset (ABHED). The ONS used Census 2021 for England and Wales as the comparative, population auxiliary data. The application focused on four variables found within both datasets: sex, age group, accommodation type and ethnicity. The full report can be found here: Office for National Statistics (ONS), released 8 December 2023, ONS website, methodology, Quality indicators for representativeness in administrative data: R-indicators and distance metrics. https://www.ons.gov.uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/qualityindicatorsforrepresentativenessinadministrativedatarindicatorsanddistancemetrics The end of the report includes the acknowledgement: ' We would like to thank Professor Natalie Shlomo, Dr Sook Kim and the University of Manchester for developing and providing us with the method and code to apply R-indicators and distance metrics to administrative data at the ONS. We would also like to thank them for their input and support throughout this research. We are also grateful to our colleagues within the ONS for their support and topic knowledge. This research was funded in partnership with the ONS by the Methodological advancements on the use of administrative data in official statistics grant from the Economic and Social Research Council (ESRC), ES/V005456/1, United Kingdom.'
First Year Of Impact 2023
Sector Government, Democracy and Justice
Impact Types Policy & public services

 
Title Quality Indicators for Administrative Data User Manual For R Package on GitHub 
Description R Package and User Manual for Quality Indicators for Administrative Data on GitHub. Authors: Myong Sook Kim, Natalie Shlomo 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact The Office for National Statistics is using the code to assess their administrative data and we are working closely with them as partners on the grant. 
URL https://github.com/sook-tusk/qualadmin