Development and application of statistical methods for addressing the heterogeneity of data collection intervals common in longitudinal datasets

Lead Research Organisation: University of Leeds
Department Name: Sch of Geography

Abstract

There is great interest in many sciences in gathering data over time and analysing patterns of change. Examples include the study of growth curves, psychological changes, and changes in the prevalence of diseases in an area. Often, researchers examine the relationship of these patterns of data (longitudinal exposures) to later events (outcomes), which requires the use of data analysis techniques that describe patterns of the longitudinal exposure in individuals (e.g. growth curves in individual people). There are a number of techniques that can be used to do this. Multilevel models (MLMs) start by describing the average pattern of the longitudinal exposure, but also give information on how much individual patterns differ from it; the results from these models are easy to understand but they cannot describe very complicated patterns. On the other hand, latent growth curve models (LGCMs) estimate individual patterns of a longitudinal exposure by creating extra data that describes them, rather than an average trajectory. To do this, latent growth curve models represent time in an unusual way - by adding a 'factor loading' relating the data describing the curves to each measurement of the longitudinal exposure. This factor loading is often set to the time at which the measurement was taken, but can be estimated by the model, which allows for very complex curves to be represented by LGCMs much more easily than in MLMs. LGCMs can also be extended to growth mixture models (GMMs), which identify underlying subgroups in the data based on the types of patterns of the longitudinal exposure. However, LGCMs require the data in all individuals to be measured at exactly the same time points - called 'interval homogeneity'. However, this is rarely the case in practice, especially when using observational data (e.g. children's growth curves recorded as measurements in their medical records); thus, these most flexible modelling techniques cannot be used widely. Another method that can be used to describe longitudinal exposures is functional data analysis (FDA). This describes individual patterns of the longitudinal exposure by fitting smooth curves in smaller time segments. These segments are bounded by 'knots', the number and position of which are chosen by the researcher. This is also a flexible method, but can be inaccurate when there exist wide spaces between measurements of the longitudinal exposure.

This project aims to examine the utility of carrying out FDA on a longitudinal exposure without interval homogeneity by using the individual patterns this describes to interpolate individual measurements and create interval homogeneity, thereby allowing for the use of latent growth curve modelling to analyse the patterns of the longitudinal exposure while relating this to a later outcome. These aims will be addressed using real and simulated data, and the following questions will also be addressed: a) How can the optimum points for interpolation of measurements be found?; and b) How should the optimum 'basis function' be chosen (i.e. the types of curves used to fit segments of the longitudinal exposure in FDA)?. The results will also be compared to those from LGCMs (which assume interval homogeneity) and from MLMs.

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000746/1 01/10/2017 30/09/2027
1943044 Studentship ES/P000746/1 01/10/2017 31/07/2021 Sarah Gadd