Linkage of national longitudinal cohort studies and administrative data: A mutually beneficial arrangement (Link-AMBA)

Lead Research Organisation: University College London
Department Name: Social Science

Abstract

Over recent decades, there has been increasing availability of administrative data created by government and public bodies (for example schools and the NHS) for research purposes. The UK also has a successful history of long-running cohort studies which are designed to be representative of the national population. Increasingly, linkages are being made (with consent) between participants' cohort data and their data from administrative records, which open up additional research opportunities.

The main objective of this project is to explore, develop and disseminate methods which utilise such linked national longitudinal cohort and administrative data to help better address important questions for society, health, and related policies. In doing this, we will address three specific questions:

i) How can linked administrative data aid the handling of missing cohort data? Cohorts suffer from missing data, particularly due to attrition of cohort members over time. Unless these missing data are appropriately handled in analyses, results may be affected by bias. We will use information from linked administrative data which is predictive of missingness in the cohort data in appropriate statistical methods to help reduce, or ideally eliminate, this bias.

ii) How can linked cohort data improve our understanding of the quality of administrative data? Some information can be reasonably expected to be correct in cohort study data. This fact can be utilised to assess how well we capture this same information in linked administrative data, where we may have more doubts over accuracy. We will use linked cohort data to assess the quality of data from administrative sources, then consider the potential impact of any misclassification on typical analyses. This work is important for the many users of these administrative data sources as it will provide validity of the information they frequently use.

iii) How can linked cohort data help address residual confounding in analyses of administrative data? Statistical analyses generally try to account for confounding (alternative explanations for an association of interest). There are many instances where analyses of administrative data cannot sufficiently account for such confounding due to lack of information about certain relevant variables. The consequence is often bias due to residual confounding. Cohort data, however, will often contain rich information on potential confounders. We will use linked cohort and administrative data and compare results from analyses that use only variables in the administrative data with those additionally using variables in the cohort data. We will then explore alternative approaches whereby these findings can be incorporated into analyses of the whole administrative data sample.

This work will be undertaken using data from national longitudinal cohort studies based at the UCL Centre for Longitudinal Studies (CLS) linked to several administrative data sources covering health, education, and higher education.

An important part of the project is the initiation of a collaboration between researchers at CLS and colleagues from UCL Great Ormond Street Institute of Child Health (ICH). The resultant multidisciplinary research team will have combined expertise across a variety of areas including social science, survey methodology, applied and methodological statistics, and data linkage methodology, as well as extensive experience of using the cohort studies and administrative datasets being utilised in the project.

As well as disseminating our findings via peer-reviewed academic journals and conference presentations, we will develop guidance and training for users of the national longitudinal cohort studies to help them better utilise the available administrative data linkages.

We will also work closely with the ESRC to share learning from the project and participate in wider activities to raise awareness of the value of social science research methods.
 
Description The two main outputs so far relate to analyses of linked National Child Development Study (NCDS) and Hospital Episode Statistics (HES) data.

In the first analyses, we examined the linkage quality and population representativeness of the NCDS-HES linkage through use of both internal (NCDS-HES) and external (population-level HES) data. Our findings suggest that the linkage quality of the NCDS-HES data is high and that the linked sample maintains an excellent level of population representativeness. These analyses will both improve the quality and transparency of research using this linked data resource and encourage providers and users of other linked data resources to undertake and publish similarly thorough evaluations. This work has already been published as part of the Centre for Longitudinal Studies (CLS) Working Paper Series and slightly different output (more general guidance for assessing the quality and population representativeness of link cohort and administrative data with results from the NCDS-HES linkage as an exemplar) is currently under review at the International Journal of Population Data Science.

In separate analyses, we have used a data-driven approach to identify HES variables predictive of NCDS non-response and investigated whether inclusion of such variables within principled methods of missing data handling can improve cohort representativeness. We conclude that inclusion of such HES variables in analyses of NCDS data does not improve the handling of non-response over and above similar analyses using NCDS variables which are predictive of non-response. This is an important finding as it means that for future analyses of NCDS data there seems limited added value of utilising linked HES data in the handling of non-response. This work has already been published as part of the CLS Working Paper Series and is currently under review at BMC Medical Research Methodology.

Both analyses were presented at: i) an invited seminar at St George's, University of London; ii) the Health Studies User Conference 2022; iii) the International Population Data Linkage Network Conference 2022; and iv) and the Royal Statistical Society International Conference 2022, providing a broad range of audiences for this work.

A further analysis has focused on how linked cohort data can be used to help handle residual confounding in analyses of administrative data, using linked Millennium Cohort Study (MCS) and National Pupil Database (NPD) data alongside population NPD data as the exemplar. This work is nearing completing and will similarly be published as part of the CLS Working Paper Series and submitted for journal publication.
Exploitation Route In what ways might the outcomes of this funding be taken forward and put to use by others?
The findings so far will be of interest to: i) anyone using linked NCDS-HES data (given greater quality assurance); ii) anyone using unlinked NCDS data who are interested in handling missing data (given the conclusion that use of linked HES data in this context likely has limited added value); iii) anyone interested in quality assurance of other linked cohort-administrative data sources (since our work provides a template); iv) anyone interested in missing data handling in cohort studies (given our methodological approach); and v) anyone conducting analyses using administrative data with concerns over residual confounding (since our work provides a template). Substantive research conducted as described above will be of improved quality and able to draw conclusions more reliably, with downstream implications on policy and practice into which these research findings feed. Since NCDS research findings are often used in this context, there is the potential for meaningful impact.
Sectors Communities and Social Services/Policy,Healthcare,Government, Democracy and Justice

 
Description UCL Great Ormond Street Institute of Child Health 
Organisation University College London
Department Great Ormond Street Institute of Child Health
Country United Kingdom 
Sector Public 
PI Contribution This grant has facilitated a new collaboration between researchers working at the UCL Centre for Longitudinal Studies and the UCL Great Ormond Street Institute of Child Health. We (research team based at the UCL Centre for Longitudinal Studies) have provided the opportunity for collaborators at the UCL Great Ormond Street Institute of Child Health to conduct research using our world-renowned national longitudinal population studies linked to administrative data sources.
Collaborator Contribution Collaborators at the UCL Great Ormond Street Institute of Child Health have provided expertise in the analysis of linked health data. They have contributed to the conception and design of each study within the project, aided with the interpretation, edited and revised the papers. The collaboration has developed beyond this initial research grant, with collaborators now taking specialist leadership roles in a large grant based at the UCL Centre for Longitudinal Studies.
Impact All outputs from the grant have been in collaboration with researchers at the UCL Great Ormond Street Institute of Child Health.
Start Year 2021
 
Description Health Studies User Conference 2022 presentation by Nasir Rajah 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Health Studies User Conference 2022 presentation on "Using linked Hospital Episode Statistics data to aid the handling of missing cohort data" by Nasir Rajah, including Q&A session.
Year(s) Of Engagement Activity 2022
URL https://ukdataservice.ac.uk/events/health-studies-user-conference-2022/
 
Description International Population Data Linkage Network Conference 2022 presentation by Nasir Rajah 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact International Population Data Linkage Network Conference 2022 presentation on "Using linked Hospital Episode Statistics data to aid the handling of missing cohort data" by Nasir Rajah, including Q&A session.
Year(s) Of Engagement Activity 2022
URL https://ijpds.org/article/view/1997
 
Description International Population Data Linkage Network Conference 2022 presentation by Richard Silverwood 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact International Population Data Linkage Network Conference 2022 presentation on "Examining the quality and sample representativeness of linked 1958 National Child Development Study and Hospital Episode Statistics data" by Richard Silverwood, including Q&A session.
Year(s) Of Engagement Activity 2022
URL https://ijpds.org/article/view/1990
 
Description Royal Statistical Society International Conference 2022 presentation by Richard Silverwood (1) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Royal Statistical Society International Conference 2022 presentation on "Examining the quality and sample representativeness of linked 1958 National Child Development Study and Hospital Episode Statistics data" by Richard Silverwood, including Q&A session.
Year(s) Of Engagement Activity 2022
URL https://rss.org.uk/training-events/conference2022/
 
Description Royal Statistical Society International Conference 2022 presentation by Richard Silverwood (2) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Royal Statistical Society International Conference 2022 presentation on "Using linked Hospital Episode Statistics data to aid the handling of missing cohort data" by Richard Silverwood, including Q&A session.
Year(s) Of Engagement Activity 2022
URL https://rss.org.uk/training-events/conference2022/
 
Description St. George's University of London Population Health Research Institute Seminar by Richard Silverwood 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Invited seminar at St. George's University of London Population Health Research Institute on "Using linked Hospital Episode Statistics data to aid the handling of missing cohort data" by Richard Silverwood, including Q&A session
Year(s) Of Engagement Activity 2022
 
Description Using linked administrative data: Hospital Episode Statistics linked with the CLS cohorts 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact UCL Centre for Longitudinal Studies training event on "Using linked administrative data: Hospital Episode Statistics linked with the CLS cohorts" led by Richard Silverwood.
Year(s) Of Engagement Activity 2022