Census Innovation at CeLSIUS

Lead Research Organisation: University College London
Department Name: Epidemiology and Public Health

Abstract

This project aims to develop two aspects of Census data.
1) Addressing the gap in provision with regard to access and utilising the most restricted access UK Census data (other than the longitudinal studies) by providing enhanced user information guidance, training and user support

2) Creation of a fake (impossible) longitudinal England and Wales dataset to enable users to explore the data, select variables and draft code for their analyses

Gap in provision

This currently exists with regard to support for using secure UK Census data other than the longitudinal studies which are currently supported by three organisations CeLSIUS in England and Wales, NILS-RSU in Northern Ireland and SLS-DSU in Scotland - collectively they form UKCenLS. Less restricted access versions of the data are currently well supported by the UK Data Service but there is no support for the most restricted access data sets other than limited user information and data storage and research clearance. CeLSIUS aims to further support two groups of data: secure origin-destination or 'flow' data (migration, commuting, student migration, and second residence sets) from 2011 and 2021/2, and de-identified individual and household microdata from 1961 to 2021.

This project seeks to assess feasibility and affordability of these potential new support services and provide the required user information guidance and training in particular for the 2021/2 data where none yet exists.

This service will support researchers who wish to use UK census 2021/2 migration and commuting and individual data for social science-led research, but also compare change over time.

CeLSIUS has considerable experience in supporting the user of the other restricted access census data and will ensure that no data that could identify an individual becomes public.

Creation of a fake longitudinal data set

Currently it is hard for users to know what variables are in the Office for National Statistics Longitudinal Study (ONS LS) and is getting harder with each Census. There are approximately 600 variables per Census and in some cases hundreds of categories for a variable.

The current solutions: a) ask CeLSIUS and/or ONS b) use the data dictionary c) look at the Census forms. It is hard for users to choose the best variable options when they are so many options. This means users may select the wrong variables for their project and then must reapply for any additional variables which causes delays and places administrative burden on ONS and CeLSIUS. Also we have users who never see the data set and we do analysis for them and no users ever see the full data set so they do not always know what is best to select.

We aim to create a individual level complete fake longitudinal data set to be openly available. This has imaginary people from 1971-2021 with impossible characteristics covering all options in the Censuses. This open data set will reduce burden on users and the user support team of the ONS LS. It will also be a useful tool for training.

Publications

10 25 50