Linkage of national longitudinal cohort studies and administrative data: A mutually beneficial arrangement (Link-AMBA)

Lead Research Organisation: University College London
Department Name: Social Science


Over recent decades, there has been increasing availability of administrative data created by government and public bodies (for example schools and the NHS) for research purposes. The UK also has a successful history of long-running cohort studies which are designed to be representative of the national population. Increasingly, linkages are being made (with consent) between participants' cohort data and their data from administrative records, which open up additional research opportunities.

The main objective of this project is to explore, develop and disseminate methods which utilise such linked national longitudinal cohort and administrative data to help better address important questions for society, health, and related policies. In doing this, we will address three specific questions:

i) How can linked administrative data aid the handling of missing cohort data? Cohorts suffer from missing data, particularly due to attrition of cohort members over time. Unless these missing data are appropriately handled in analyses, results may be affected by bias. We will use information from linked administrative data which is predictive of missingness in the cohort data in appropriate statistical methods to help reduce, or ideally eliminate, this bias.

ii) How can linked cohort data improve our understanding of the quality of administrative data? Some information can be reasonably expected to be correct in cohort study data. This fact can be utilised to assess how well we capture this same information in linked administrative data, where we may have more doubts over accuracy. We will use linked cohort data to assess the quality of data from administrative sources, then consider the potential impact of any misclassification on typical analyses. This work is important for the many users of these administrative data sources as it will provide validity of the information they frequently use.

iii) How can linked cohort data help address residual confounding in analyses of administrative data? Statistical analyses generally try to account for confounding (alternative explanations for an association of interest). There are many instances where analyses of administrative data cannot sufficiently account for such confounding due to lack of information about certain relevant variables. The consequence is often bias due to residual confounding. Cohort data, however, will often contain rich information on potential confounders. We will use linked cohort and administrative data and compare results from analyses that use only variables in the administrative data with those additionally using variables in the cohort data. We will then explore alternative approaches whereby these findings can be incorporated into analyses of the whole administrative data sample.

This work will be undertaken using data from national longitudinal cohort studies based at the UCL Centre for Longitudinal Studies (CLS) linked to several administrative data sources covering health, education, and higher education.

An important part of the project is the initiation of a collaboration between researchers at CLS and colleagues from UCL Great Ormond Street Institute of Child Health (ICH). The resultant multidisciplinary research team will have combined expertise across a variety of areas including social science, survey methodology, applied and methodological statistics, and data linkage methodology, as well as extensive experience of using the cohort studies and administrative datasets being utilised in the project.

As well as disseminating our findings via peer-reviewed academic journals and conference presentations, we will develop guidance and training for users of the national longitudinal cohort studies to help them better utilise the available administrative data linkages.

We will also work closely with the ESRC to share learning from the project and participate in wider activities to raise awareness of the value of social science research methods.


10 25 50