Methods for handling missing data and covariate measurement error in individual participant data meta-analysis

Lead Research Organisation: London School of Hygiene & Tropical Medicine
Department Name: Epidemiology and Population Health

Abstract

In recent decades there has been a concerted drive towards ensuring medicine is evidence based, meaning that decisions about patient care and public health are made in light of the current best available evidence. Central to establishing what constitutes the best available evidence in regards to a particular clinical or public health question is the process of evidence synthesis. For clinical questions which can be numerically quantified, the primary tool for synthesizing evidence is meta-analysis, which involves taking the results from previous studies and combining them to give a single summary estimate of the quantity of interest.

The gold standard approach to meta-analysis involves collating the individual participant data (IPD) from all of the previously conducted relevant studies and analysing the resulting combined dataset. Pooling the individual level data confers a number of advantages compared to the traditional meta-analysis approach which involves combining the overall results of studies (as opposed to analysing their original, individual level data). These advantages include the ability to make statistical adjustments for a consistent set of variables, exploration of whether treatment effects vary between different groups of patients, and the ability to investigate the shape of relationships between variables.

However, there are a number of issues which threaten the potential of IPD meta-analysis. Principal among these are issues caused by missing data and measurement error. Missing data occur for two reasons in IPD meta-analyses. The first is when some studies did not collect data on one or more variables which are of interest, such that the values of these variables are missing for all participants in these studies. The second occurs when, for a variety of reasons, some participants have missing values despite the fact the study intended to collect the variable. Missing data cause results to be less precise and possible biased. Measurement error occurs when variables of interest can only be measured imprecisely. If ignored, measurement error also causes biases in results.

The proposed research seeks to develop new statistical methods to deal with these two issues. By doing so, they will enable researchers to obtain more precise and less biased estimates from IPD meta-analyses, thereby giving more accurate answers to important clinical and public health questions. New methods will be published in scientific journals, and methods implemented into statistical software packages to enable them to be used by researchers. This will help enable medical practitioners and public health experts to base their decisions and policies on the best available evidence, thus improving health outcomes for patients and the population more generally.

The work will be conducted by the Fellowship applicant.

Technical Summary

The overall aim of the proposed research is to develop and apply methods based on multiple imputation (MI) to tackle the issues of missing data and covariate measurement error in IPD meta-analysis. This will be achieved through pursuing the following objectives:
1) I will critically evaluate existing multiple imputation (MI) approaches which can be used for imputing systematically missing data. Parameter identifiability problems will be tackled through use of ridge priors. I will explore the extension of a full conditional specification approach to imputation of systematically missing data to accommodate missingness in categorical variables.
2) I will develop an MI approach for (sporadic and systematically) missing data which accommodates non-linear covariate effects and interactions in the substantive model, by extending the approach I have recently proposed for the setting of single studies.
3) I will develop an MI approach for covariate measurement error which accommodates non-linear covariate effects and interactions in the substantive model, by extending the approach I have recently proposed for imputing partially observed covariates in the context of single studies.
4) I will develop an MI approach for covariate measurement error which handles studies without repeat measurements. This approach will explicitly model between-study heterogeneity in the measurement error distribution and distribution of underlying covariates.
5) Existing theory on congeniality between imputation and substantive models will be used to establish the order in which meta-analysis and pooling across imputations should be performed when a multi-level imputation model is used.
6) I will investigate the feasibility of doubly-robust estimators for missing data in IPD meta-analysis, and their use for investigating sensitivity to mis-specification of the imputation model.
7) I will actively disseminate the methods developed through their implementation in statistical packages.

Planned Impact

The proposed research will enable more accurate estimates to obtained in IPD meta-analyses in a number of contexts, and this has the potential to improve clinical and public health outcomes. For example, improved estimation of prognostic models would enable patients at high risk of disease to be better identified, potentially leading to earlier intervention and improved health outcomes. A better understanding of how treatment effects vary between individuals would enable treatments to be better targeted, enabling the most appropriate interventions to be used for individuals and reducing prescription of drugs to patients who would derive no benefit. Lastly, the methods will be suitable for analyses which further our understanding of disease aetiology, which in turn should translate into improved health outcomes for patients and healthy populations generally.

Publications

10 25 50
publication icon
Bartlett J (2015) Multiple Imputation of Covariates by Substantive-model Compatible Fully Conditional Specification in The Stata Journal: Promoting communications on statistics and Stata

publication icon
Bartlett JW (2014) Methodology for multiple imputation for missing data in electronic health record data in International Biometric Conference 2014

publication icon
Bartlett JW (2014) Improving upon the efficiency of complete case analysis when covariates are MNAR. in Biostatistics (Oxford, England)

publication icon
Bartlett JW (2016) Missing covariates in competing risks analysis. in Biostatistics (Oxford, England)

 
Title Stata program for predictive value weighting to allow for covariate misclassification 
Description pvw is a Stata program which implements the predictive value weighting approach for adjustment for misclassification in a binary covariate in a logistic regression model, as proposed by Lyles and Lin (2010). 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The program is being used in teaching materials for Masters students at the London School of Hygiene & Tropical Medicine, giving an easy to use method for allowing for the effects of misclassification in a binary covariate in logistic regression models. 
URL https://ideas.repec.org/c/boc/bocode/s457825.html
 
Title smcfcs package for R 
Description The software is a package for R which implements the Substantive Model Compatible Fully Conditional Specification approach to multiple imputation of covariates. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The package has been used in teaching courses on methods for handling missing data. 
URL https://cran.r-project.org/web/packages/smcfcs/index.html
 
Title smcfcs package for Stata 
Description The software implements the Substantive Model Compatible Fully Conditional Specification (SMC-FCS) multiple imputation for missing covariates in Stata. The software can be installed freely into Stata, and used to impute missing covariates using the SMC-FCS approach. An accompanying publication in the Stata Journal has been published which describes the software package and its use. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The package is used as part of the LSHTM short course "Statistical Analysis of Missing Data with Multiple Imputation and Inverse Probability Weighting". I believe it is also planned to be used in a similar course on missing data run by the MRC Biostatistics Unit in Cambridge. 
URL https://ideas.repec.org/c/boc/bocode/s457968.html