Statistical methodology for meta-analysis of epidemiological studies using individual participant data.

Lead Research Organisation: University of Cambridge
Department Name: Institute of Public Health

Abstract

An increasing number of factors are being proposed as important predictors and/or causes of chronic diseases, particularly with the advent of technologies that enable rapid measurement of large numbers of blood proteins and genetic factors. To enable a more comprehensive and powerful evaluation of the relevance of such factors, it is often necessary to pool data from different studies. If such studies can reliably demonstrate that a particular factor is relevant to a condition (such as heart disease), then this could have important implications for the prediction and prevention of disease (exemplified by measurement and modification of blood cholesterol values). We plan to advance the development of statistical methods for use in such data pooling approaches by working on detailed information previously collated on up to 40,000 cases of heart attack among 1 million participants from about 100 studies. The main aim is to develop methods that will enable more reliable conclusions to be drawn about (i) whether a straight line (or some more complicated relationship) best describes the relationship between a factor and the risk of disease, and (ii) whether associations of particular factors with disease risk are likely to reflect cause-and-effect relationships. The methods that will be developed will have applications to many different situations and to different diseases, and will become increasingly important as the trend continues towards data sharing and pooling in large, collaborative multi-centre studies.

Technical Summary

Combination of individual participant data from multiple observational studies is increasingly used to evaluate the relevance of risk markers to disease, such as the 1-million-participant, 95-cohort Emerging Risk Factors Collaboration (ERFC), which is coordinated by our group. Optimum biostatistical methods are needed to help maximize the value of such databases. Our proposal addresses unresolved questions in relation to: (i) characterisation of the shape of relationships between quantitative exposures (such as biomarkers) and disease outcomes, and (ii) control of the impact of possible confounding factors on exposure-disease relationships.

Reliable characterisation of the shape of exposure-disease relationships can have important scientific and public health implications, as exemplified by the log-linear relationship of blood pressure with major cardiovascular outomes. Such assessments can, however, be misleading in the presence of exposure measurement error (which typically dilutes the strength of associations) and diversity across studies in exposure distributions (which complicates selection of an appropriate scale in which to combine studies). Our preliminary work has suggested, moreover, that standard measurement error correction methods can mis-estimate non-linear dose-response relationships. Through implementation in the ERFC database and through simulation, we will develop approaches that overcome these limitations by systematic investigation using parametric models for exposure and measurement error alongside flexible fractional polynomials or spline models for the exposure-disease relationship. We will investigate selection of appropriate scales by estimation of the unstandardised exposure-disease association and the standard deviation of the usual exposure for each study, performing random effects meta-analyses on the unstandardised and standardised scales, and comparing measures of heterogeneity.

Error in the measurement of potential confounding factors (or measurement of only a subset of known confounders in some studies) can lead to misleading or artefactual associations, and, hence, erroneous inferences about disease causation. Previous meta-analyses have generally not corrected for such biases. To help correct for confounder measurement error, we will extend methods we developed to correct for exposure measurement error, by using cohort-specific correction matrices inferred from multivariate random-effects meta-analysis. To address missing data on confounders, we propose to estimate the unadjusted association in each study and the adjusted association in studies with recorded confounders, and then to combine these associations using bivariate random-effects meta-analysis.

The products of this methodological work will have rapid application, initially to the ERFC and then to other existing data pooling initatives. Our findings should become increasingly useful as the trend continues towards data pooling in large, collaborative multi-centre analyses.

Publications

10 25 50