Analysis of Longitudinal Data with Varying Numbers of Measurements

Lead Research Organisation: University College London
Department Name: Unlisted

Abstract

Patients with chronic conditions have tests taken periodically to assess their health, as part of their routine clinical care. For example HIV patients have tests of their immune function. These tests may be used to identify for example when a treatment could be started or changed. The data from many patients together can be analysed to see in general when treatments are applied, and how the disease changes over time in response to therapy.

The number of measurements obtained often varies between patients, sometimes because patients miss clinic appointments. Alternatively this might be inherent, such as when each measurement concerns a clinical episode (e.g. an infection or hospital stay) which patients may experience repeatedly during study follow-up. The distinction is important and should influence how the data are analysed.

If patients with more measurements have a different disease profile over time to those with fewer measurements then this will likely cause a bias when the data are analysed, so that the research findings may be misleading. Many people have suggested modifications of the methods used to analyse the data so as to avoid such bias caused by death or some patients ?dropping-out? (i.e. attending consistently for a period but then stopping their care). However little research has been done into how to deal with data where patients are measured sporadically. Furthermore there is little research into how to deal with data where the number of measurements varies inherently. One of the key approaches suggested may not use the data efficiently, i.e. the results of the analysis will not be as precise as they could be.

We aim to adapt the existing methods of data analysis to longitudinal data, and where appropriate improve the efficiency.

We will implement the various possible approaches to data from the largest cohort of HIV patients in the UK to assess their impact. We will also develop a set of practical recommendations for researchers to help them to assess how to analyse their data.

Technical Summary

Observational longitudinal studies in which repeated measurements are taken for patients are crucial to inform our understanding of how and why chronic disease processes change over time. For example in studies of HIV disease repeated measurements of such markers as the HIV viral load are obtained over follow-up time to assess disease status and response to therapy.

The number of measurements obtained often varies between patients, because the frequency of clinic appointments varies, and these may sometimes be missed. Alternatively this variation might be inherent, for example when each measurement concerns a clinical episode which patients may experience repeatedly during follow-up. The distinction can lead to different methods of analysis. In the former case our interest is likely in the associations reflected by all measurements (obtained or missed), whereas in the latter our interest is only in the measurements obtained.

If patients with more measurements have a different disease profile over time to those with fewer then this may cause bias if standard regression techniques (e.g. random-effects models, generalised estimating equations (GEE)) are applied. There are established methods to deal with this problem when it arises through patients dropping out from the study. However little research has addressed how to deal with both intermittently missing data and irregular measurement times. Furthermore methods to deal with inherently varying numbers of measurements (termed informative cluster size, Williamson) are limited, and have not been applied specifically to longitudinal data.

We aim to adapt the existing methods to longitudinal data. When applying GEE to data with informative cluster size we aim to gain efficiency (more precise estimation) over current methods through using a realistic working correlation. To apply random effects models to intermittently missing data we aim to jointly model the underlying disease process and the rate at which measurements are taken under assumptions about the missing data.

The same problems arise when the number of measurements for a particular time-varying covariate (e.g. indicating a change in therapy) is linked to the disease profile in that period, but this has not been acknowledged in the literature. We aim to extend methods to this further scenario.

We will implement the various possible strategies to simulated and real (two HIV studies) data to assess the impact in reducing bias and increasing efficiency. We will also develop a set of practical recommendations for researchers to guide the choice of appropriate population and suitable method for their analysis.
 
Description MRC BSU 
Organisation University of Cambridge
Department MRC Biostatistics Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution Joint collaborative methodological work on the analysis of clustered and longitudinal data
Collaborator Contribution Joint collaborative work
Impact Three published papers, plans for further work applied to randomised controlled trials.
Start Year 2007