Developing and disseminating robust methods for handling missing data in epidemiological studies

Lead Research Organisation: University of Bristol
Department Name: Social Medicine

Abstract

Longitudinal studies ? studies in which individuals are followed over periods of many months or years ? are of great importance in understanding how aspects of people?s lifestyle or environment influence their health and wellbeing. When many individuals are followed over extended periods it is inevitable that measurements on particular variables are sometimes missing, for example because a measuring device broke down, or a subject did not answer certain questions or did not attend an examination. Individuals may also drop out of the study altogether. Missing values raise difficult issues in the analysis of data from longitudinal studies, and failing to address these appropriately can lead to results that are both biased (they differ from the results that would be observed if the missing values could have been included) and inefficient (there is more uncertainty about the results than there would be if the missing values could have been included). New statistical methods that do address these issues have been proposed, and have the potential to decrease bias and increase efficiency in analyses of longitudinal studies. However, these methods can be highly complex and difficult to apply, and their incorrect use may actually increase bias in certain circumstances. We will develop solutions to the remaining problems with applying one of these methods (multiple imputation), including developing strategies for deciding whether missing values are likely to cause bias in analyses, and checks for whether the multiple imputation models are appropriate. We will also develop new methods which still work even when aspects of the chosen statistical models are incorrect. We will incorporate our new methods into existing software, to maximise their future use, as well as publishing the results in scientific journals.

Technical Summary

Missing data is a problem common to almost every clinical and epidemiological study, especially when large cohorts are followed over long periods of time. Traditionally, missing data have been dealt with by complete-case analysis - including in the analysis only those participants with complete data. Medical and epidemiological researchers are increasingly aware that such analyses fail to allow appropriately for missing data, and can lead to both bias and inefficiency. Practical tools for analysing datasets with missing data are now available, and those based on multiple imputation (MI) are increasingly recommended. However their typical use is too uncritical, does not make assumptions explicit, and may replace the potential bias associated with complete-case analyses with different biases arising from inappropriate assumptions or mis-specification of imputation models. The proposed research will focus on tackling the practical barriers to the most effective use of MI, by developing preliminary analyses and diagnostic tools. We will adapt existing software to improve model diagnostics that may alert the user to problems in imputation procedures. We will use simulated and real data to investigate the size and directions of bias caused by ignoring the structure of the data (e.g. longitudinal, clustered) in the imputation model, and develop and compare different ways in which the structure(s) can be incorporated in the imputation model. We will also develop more robust methods for handling missing data. These new methods will include both doubly robust weighted analyses and a second generation of MI based on the doubly robust principle. We will develop methodological approaches to sensitivity analyses in MNAR situations, including parameters to be used in doubly robust weighted analyses. In particular, we will focus on the situation where it is suspected that different MNAR mechanisms operate in different parts of a complex dataset. The proposed research will focus on the application of these methods in simulated data and in longitudinal studies (using data from the ALSPAC birth cohort study, the National Child Development Study (NCDS) and the Millennium Cohort Study (MCS)).

Publications

10 25 50
 
Title MLwiN macros 
Description Updated version of multi-level multiple imputation macros for MLwiN software, available for free download at www.missingdata.org.uk. 
Type Of Material Improvements to research infrastructure 
Year Produced 2010 
Provided To Others? Yes  
Impact These macros are freely available for use by other researchers 
 
Title Missing data moduel for LEMMA 
Description Production of missing data module for the LEMMA project at the Centre for Multilevel Modelling, University of Bristol. These materials are available for free download from www.missingdata.org.uk from the registered users area. 
Type Of Material Improvements to research infrastructure 
Year Produced 2012 
Provided To Others? Yes  
Impact N/A 
 
Title Stata MNAR CC 
Description The augcca Stata program implements a new approach for handling missing covariates (doi: 10.1093/biostatistics/kxu023). The approach can be used to fit linear regression models when one of the covariates is partially observed, and MNAR but with missingness independent of the outcome in the analysis model. The program is freely available by typing "net from http://missingdata.lshtm.ac.uk/stata" into Stata's command window and selecting "augcca" 
Type Of Material Improvements to research infrastructure 
Year Produced 2014 
Provided To Others? Yes  
Impact None as yet 
 
Title Stata command 
Description Development of tutorial for use of our Stata command realcomImpute, which exports/imports data between Stata and the REALCOM Impute package, available for free download at www.missingdata.org.uk. 
Type Of Material Improvements to research infrastructure 
Year Produced 2012 
Provided To Others? Yes  
Impact This tutorial and command are freely available for use by researchers. 
 
Title Stata software for congenial MI 
Description New Stata software for congenial multiple imputation of covariates, freely available from www.missingdata.org.uk. 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact N/A 
 
Description MiDiA 
Organisation London School of Hygiene and Tropical Medicine (LSHTM)
Department Faculty of Epidemiology and Population Health
Country United Kingdom 
Sector Academic/University 
PI Contribution This collaboration was started as part of a previous MRC grant into Missing Data Methods (PI Prof J Sterne) and is continuing to develop.
Collaborator Contribution We collaborate in this grant. Have also published joint paper(s).Joint discussions leading to methodological developments.Discussion and development of ideas
Impact Statisticians only. Co-authored papers published.
Start Year 2007
 
Description MiDiA 
Organisation Medical Research Council (MRC)
Department MRC Clinical Trials Unit
Country United Kingdom 
Sector Public 
PI Contribution This collaboration was started as part of a previous MRC grant into Missing Data Methods (PI Prof J Sterne) and is continuing to develop.
Collaborator Contribution We collaborate in this grant. Have also published joint paper(s).Joint discussions leading to methodological developments.Discussion and development of ideas
Impact Statisticians only. Co-authored papers published.
Start Year 2007
 
Description MiDiA 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution This collaboration was started as part of a previous MRC grant into Missing Data Methods (PI Prof J Sterne) and is continuing to develop.
Collaborator Contribution We collaborate in this grant. Have also published joint paper(s).Joint discussions leading to methodological developments.Discussion and development of ideas
Impact Statisticians only. Co-authored papers published.
Start Year 2007
 
Description MiDiA 
Organisation University of Cambridge
Department MRC Biostatistics Unit
Country United Kingdom 
Sector Public 
PI Contribution This collaboration was started as part of a previous MRC grant into Missing Data Methods (PI Prof J Sterne) and is continuing to develop.
Collaborator Contribution We collaborate in this grant. Have also published joint paper(s).Joint discussions leading to methodological developments.Discussion and development of ideas
Impact Statisticians only. Co-authored papers published.
Start Year 2007