Development and application of statistical methods for addressing the heterogeneity of data collection intervals common in longitudinal datasets

Lead Research Organisation: University of Leeds

Department Name: Sch of Geography

Abstract

There is great interest in many sciences in gathering data over time and analysing patterns of change. Examples include the study of growth curves, psychological changes, and changes in the prevalence of diseases in an area. Often, researchers examine the relationship of these patterns of data (longitudinal exposures) to later events (outcomes), which requires the use of data analysis techniques that describe patterns of the longitudinal exposure in individuals (e.g. growth curves in individual people). There are a number of techniques that can be used to do this. Multilevel models (MLMs) start by describing the average pattern of the longitudinal exposure, but also give information on how much individual patterns differ from it; the results from these models are easy to understand but they cannot describe very complicated patterns. On the other hand, latent growth curve models (LGCMs) estimate individual patterns of a longitudinal exposure by creating extra data that describes them, rather than an average trajectory. To do this, latent growth curve models represent time in an unusual way - by adding a 'factor loading' relating the data describing the curves to each measurement of the longitudinal exposure. This factor loading is often set to the time at which the measurement was taken, but can be estimated by the model, which allows for very complex curves to be represented by LGCMs much more easily than in MLMs. LGCMs can also be extended to growth mixture models (GMMs), which identify underlying subgroups in the data based on the types of patterns of the longitudinal exposure. However, LGCMs require the data in all individuals to be measured at exactly the same time points - called 'interval homogeneity'. However, this is rarely the case in practice, especially when using observational data (e.g. children's growth curves recorded as measurements in their medical records); thus, these most flexible modelling techniques cannot be used widely. Another method that can be used to describe longitudinal exposures is functional data analysis (FDA). This describes individual patterns of the longitudinal exposure by fitting smooth curves in smaller time segments. These segments are bounded by 'knots', the number and position of which are chosen by the researcher. This is also a flexible method, but can be inaccurate when there exist wide spaces between measurements of the longitudinal exposure.

This project aims to examine the utility of carrying out FDA on a longitudinal exposure without interval homogeneity by using the individual patterns this describes to interpolate individual measurements and create interval homogeneity, thereby allowing for the use of latent growth curve modelling to analyse the patterns of the longitudinal exposure while relating this to a later outcome. These aims will be addressed using real and simulated data, and the following questions will also be addressed: a) How can the optimum points for interpolation of measurements be found?; and b) How should the optimum 'basis function' be chosen (i.e. the types of curves used to fit segments of the longitudinal exposure in FDA)?. The results will also be compared to those from LGCMs (which assume interval homogeneity) and from MLMs.

Student:

Sarah Gadd

Period of Study:

Oct 17 - Feb 21

Funder:

ESRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1943044

Research Topic:

Unclassified

Organisations

University of Leeds (Lead Research Organisation)

People	ORCID iD
Mark Gilthorpe (Primary Supervisor)
Peter Tennant (Primary Supervisor)
Ruth Blakeley (Primary Supervisor)	http://orcid.org/0000-0001-8794-962X
Alison Heppenstall (Primary Supervisor)
Sarah Gadd (Student)

Publications

Author Name Title

Publication Date Published

10 25 50

Arnold KF (2019) Adjustment for time-invariant and time-varying confounders in 'unexplained residuals' models for longitudinal data within a causal framework and associated challenges. in Statistical methods in medical research

Gadd SC (2019) Analysing trajectories of a longitudinal exposure: A causal perspective on common methods in lifecourse research. in PloS one

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
ES/P000746/1			01/10/2017	30/09/2027
1943044	Studentship	ES/P000746/1	01/10/2017	28/02/2021	Sarah Gadd

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects