Dealing with missing data in longitudinal studies

Lead Research Organisation: University of Bristol
Department Name: Social Medicine

Abstract

Longitudinal studies – studies in which individuals are followed over periods of many months or years – are of great importance in understanding how aspects of people’s lifestyle or environment influence their risk of disease. When individuals are followed over extended periods it is inevitable that measurements on particular variables are sometimes missing, for example because a measuring device broke down, or a subject did not answer certain questions or did not attend an examination. Missing values make analyses of data from longitudinal studies more complicated, because they can lead to results that are both biased (they differ from the results that would be observed if the missing values were taken into account) and inefficient (they are less precise than they would be if missing values were taken into account). New statistical methods that address these issues have been proposed, and have the potential to decrease bias and increase efficiency in analyses of longitudinal studies. However these methods can be complex and there is currently little practical experience of their use. We are investigating practical issues in applying these methods, in particular a method called multiple imputation, in order to demonstrate the circumstances in which they are useful, develop strategies for choosing models and confirming that they are appropriate, and compare different approaches and software. As well as publishing the results of the study in scientific journals, we will develop guidelines for people who use the methods in the future.

Technical Summary

Analyses of data from longitudinal studies are often complicated by the presence of missing values, caused by participant dropout or non-response. Failure to allow for this can lead to both biased and inefficient statistical analyses. Analyses ignoring problems caused by missing data are common. Statistical research has generated better ways to deal with these problems, but the methods are technically challenging. The proposed research will focus on the application of multiple imputation (MI) - the most flexible available method ? in longitudinal studies. We will demonstrate the potential of MI to reduce bias and increase precision in analyses of data from the ALSPAC birth cohort study and the ART-CC and ART-LINC HIV cohort collaborations. We will also clarify the circumstances in which analyses allowing for missing data are likely to have advantages over simpler methods. We will develop a framework for simulations that allow evaluation of characteristics of imputation procedures, and use both simulations and analyses of longitudinal data to examine how to deal with the model complexity that characterises application of these procedures. We will adapt existing software to improve model diagnostics that may alert the user to problems in imputation procedures and to facilitate sensitivity analyses that examine robustness of results to data that are missing not at random MNAR). We will work with members of the CONSORT and STROBE groups, and with journal editors, to provide guidelines on reporting analyses that deal with missing data.

Publications

10 25 50