Using linked health and administrative data to reduce bias due to missing data and measurement error in observational research

Lead Research Organisation: University of Bristol
Department Name: Faculty of Medicine and Dentistry

Abstract

The Avon Longitudinal Study of Parents and Children (ALSPAC), also known as Children of the 90s, is a health research study. Around 14,000 pregnant women joined the study in 1990-1991 and their children, born between April 1991 and December 1992, have been followed up ever since. Information about these children (and the mothers) has been collected using postal questionnaires and through clinics held at the University of Bristol.

The main aim of ALSPAC is to identify factors which influence people's physical and mental health and development so that steps can be taken to prevent illness and improve the health and well-being of the population as a whole. To do this, scientists use the data collected in ALSPAC to estimate a "measure of effect", a measure which quantifies the likely extent of association between a particular factor and the outcome they are investigating. For example, in 2003 researchers found that the use of skin preparations containing peanut oil was associated with an almost seven-fold increase in the risk of developing peanut allergy. In observational studies like ALSPAC, particularly when data is collected over a very long period of time, it is unusual to have complete information on all the individuals in the study. Some people drop out of the study for various reasons; others do not complete every questionnaire or attend every clinic; in addition, some people may not answer a whole questionnaire or may not want certain measurements taken at a clinic. All of these scenarios result in missing data. When information is more likely to be missing for some people than others (for example, heavy smokers may be less likely to complete questions on smoking), the measure of effect may be distorted (biased). Questionnaire-based studies like ALSPAC are also prone to errors because people are asked about events that they may not completely remember. In addition, some topics on questionnaires may be sensitive for some people and they might not be completely honest - about how much they smoke, for example. Both of these issues result in something called misclassification, whereby some people may be wrongly classified as having (or not having) a particular condition - such as asthma, for example - or wrongly classified as being a light smoker when in fact they are a heavy smoker. This can also lead to biased measures of effect.

One way of addressing these problems in studies like ALSPAC is to use comparable information from health or administrative (government) records. ALSPAC has already obtained education data from the DfE. In addition, the Project to Enhance ALSPAC through Record Linkage (PEARL) has been set up to obtain data on ALSPAC participants from the following records: health, benefits and earnings, criminal convictions and cautions, plus further and higher education. PEARL is currently investigating how to use the data obtained from these sources to enhance the existing ALSPAC data as well as looking at the feasibility of using such data to provide future information on health and other outcomes.

In this project I will build on the work of PEARL by investigating particular measures - smoking, IQ, and teenage depression - in depth, investigating missing data and misclassification and devising ways in which administrative and health data can be used to overcome these issues, both in ALSPAC and in similar studies. In particular, I will look at whether linked health and education data can be used to understand whether particular people are more likely to have missing information on smoking, IQ or depression. I will also investigate whether the linked data can be used to "fill in" missing information in the ALSPAC data. In addition, by comparing self-reported smoking and depression to equivalent information in the GP records I will assess how accurate the self-reported data is likely to be and what influence this may have on results based on these measures.

Technical Summary

Aim
To examine how linked health and administrative data can be used to avoid bias in cohort studies, using the Avon Longitudinal Study of Parents and Children (ALSPAC) as an exemplar.

Objectives
1. To develop methods for using linked health and administrative data to examine patterns of missing data and model missingness mechanisms in ALSPAC.
2. To incorporate linked health and administrative data in multiple imputation models.
3. To compare data in ALSPAC to equivalent outcomes recorded in linked electronic primary care records to investigate measurement error.
4. To develop methods to use both linked data and self-reported data to minimise the impact of measurement error on analyses.
5. To devise and modify existing algorithms for defining depression using electronic GP data and to use this information to estimate the prevalence of depression among ALSPAC teenagers.

Methodology
ALSPAC is a prospective cohort study. Around 14,000 pregnant women were recruited into the study during 1990-1991. Follow up is ongoing; data have been primarily collected via questionnaires and clinics held at the University of Bristol. Educational data have also been obtained via linkage to the National Pupil Database and the Project to Enhance ALSPAC through Record Linkage (PEARL) has or is currently linking to other datasets, including electronic patient (GP) records. GP data will be analysed in a safe setting and relevant statistical methods, including simulations and multiple imputation will be used as appropriate.

Scientific/medical opportunities
To draw valid conclusions from observational research, selection and measurement bias need to be quantified and their impact minimised. The proposed research will address this by using linked health and education data to examine misclassification and missingness mechanisms in ALSPAC (a large observational study) and develop ways in which linked data can be used to reduce bias.

Planned Impact

The aim of this fellowship is to understand how linked health and administrative data can be used to understand and reduce bias in observational studies. Missing data are inevitable in observational studies and methods are currently being developed to support inferences made in these studies. The work proposed here will explore biases introduced by missing data and measurement error in observational studies and investigate ways in which these different biases can be overcome using linked data. It will also develop methods for combining self-reported and linked data. This work will benefit others using observational data and, specifically, those working on longitudinal studies that have already, or plan to, collect data via linkage.

As this research is methodological, it is unlikely to have a direct effect on population health. However, the exemplar questions being addressed will contribute to our understanding of the relationship between early life exposures (breastfeeding, prenatal exposure to smoking) and cognitive and behavioural outcomes in adolescence. More importantly, ALSPAC is - and will continue to be - an important resource for carrying out research that will impact on our understanding of many areas of human health and development. The findings of the proposed work will influence how future analyses are carried out and, specifically, ensuring that the available data - both linked and self-reported data - are used in such a way as to minimise the potential for bias. This will be particularly the case for the variables investigated as part of this work but is likely to apply to other outcomes. Thus, it is anticipated that the work proposed in this application will influence research and thus impact on the NHS and the wider public in the longer term.

Full details are given in the attached "pathways to impact".

Publications

10 25 50

 
Description Development of miDOC: an expert system and methodology for multiple imputation
Amount £321,633 (GBP)
Funding ID MR/V020641/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 09/2021 
End 03/2024
 
Description Home Office / Administrative Data Research UK feasibility study
Amount £79,124 (GBP)
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 03/2020 
End 09/2020
 
Description LONGITUDINAL ADMINISTRATIVE DATA SPINE SCOPING PROJECT GRANT FOR THE SPF UK POPULATION LAB WAVE I
Amount £236,901 (GBP)
Funding ID ES/S016732/1 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 12/2018 
End 03/2019
 
Description Mental health and incontinence
Amount £525,115 (GBP)
Funding ID MR/V033581/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 02/2022 
End 11/2024
 
Description Understanding non-response in young people in Understanding Society
Amount £43,370 (GBP)
Organisation University of Essex 
Sector Academic/University
Country United Kingdom
Start 06/2023 
End 05/2024
 
Description Framework for treatment and reporting of missing data 
Organisation Murdoch Children's Research Institute
Country Australia 
Sector Academic/University 
PI Contribution Co-authored publication
Collaborator Contribution Co-authored publication
Impact Publication in the Journal of Clinical Epidemiology: Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework
Start Year 2019
 
Description Multiple imputation using linked proxy: simulation study 
Organisation London School of Hygiene and Tropical Medicine (LSHTM)
Department Department of Medical Statistics
Country United Kingdom 
Sector Academic/University 
PI Contribution We were the main investigators, carried out the statistical analysis and lead on the writing up.
Collaborator Contribution They contributed to writing up the work for publication.
Impact Published paper: Cornish RP, Macleod J, Carpenter JR, Tilling K. Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study. Emerging Themes in Epidemiology 2017; 14:14. doi:10.1186s12982-017-0068-0
Start Year 2015