Statistical modelling of longitudinal data from cohort studies to better understand phenotypes of asthma.
Lead Research Organisation:
University of Bristol
Department Name: Community-Based Medicine
Abstract
Many human diseases, such as asthma, do not have a single cause but occur as a result of complex interactions between genetic and environmental influences. The resulting disease (asthma) is therefore not a single entity but consists of several different forms (phenotypes), each of which may have different risk factors. There is now a huge potential to identify genetic causes of disease through large scale genotyping in big populations drawn form several different sources. However, the opportunity to understand the complexities of diseases such as asthma may be missed if efforts are not made to ensure that comparable classification of disease phenotypes is possible between cohorts with different study designs. The aim of this project is to develop models of asthma phenotypes by using sophisticated statistical modelling of the rich longitudinal data that are available in the ALSPAC cohort and applying these models to other cohorts with similar data availability. Investigation of the association of these phenotypes with genetic variants in known biological pathways will provide an indication if these phenotypes are pathologically distinct and, if so, will provide clues to the targets for disease modification or prevention. The methods developed will be applicable to other complex diseases.
Technical Summary
Aims & objectives: To gain a better understanding of the phenotypes of asthma by statistical modelling of multidimensional, longitudinal data and to apply these models across cohorts in different settings and with varying data sets. To examine associations between genetic variants identified through genomewide association studies of asthma with asthma phenotypes, in order to understand the biological pathways leading to different phenotypes.
Design: The project will build on work that I have developed in the Avon Longitudinal Study of Parents and Children (ALSPAC) using latent class analysis to derive wheezing phenotypes using reports of wheezing from birth to age 7.
Methodology: Mplus will be used to fit a multi-level model within the latent class analysis framework in order to expand the single approach based on wheezing symptoms to a multi-dimensional approach including other respiratory symptoms and objective measures of allergy and lung function. A pre-selection of most relevant variables using Principal Components Analysis is likely to be required to limit the number of parameters in the model. Cubic splines will be used to derive parametric curves representing the asthma phenotypes to account for the different number and timing of variables in each dataset.
The resulting models will then be tested in cross-cohort comparisons with similar longitudinal data and outcome variables. The accuracy and generalisability of models will be validated across the independent cohorts by using deviance differences.
Scientific and Medical Opportunities: The results of this study will establish whether modelling of phenotypes in complex diseases, such as asthma, is feasible across cohorts with different data acquisition methods. The ability to harmonize detailed phenotypes in consortia of large, longitudinal cohorts with complementary genetic data will increase the potential to understand how genetic and environmental influences interact to cause disease expression. Current approaches are limited to crude disease definitions. If the potential of genome wide data is to be realised to its full, large studies with comparable outcomes to enable investigation of less prevalent phenotypes will be necessary. Methods developed here will be potentially transferable to other complex, polygenic diseases.
Identification of diverse biological pathways associated with variation in disease phenotype is a key step in the identification of targets for disease modification or prevention. Some of these opportunities may be overlooked if the complexity of phenotypic heterogeneity is not accounted for in genetic and association and gene-environment interaction studies.
Design: The project will build on work that I have developed in the Avon Longitudinal Study of Parents and Children (ALSPAC) using latent class analysis to derive wheezing phenotypes using reports of wheezing from birth to age 7.
Methodology: Mplus will be used to fit a multi-level model within the latent class analysis framework in order to expand the single approach based on wheezing symptoms to a multi-dimensional approach including other respiratory symptoms and objective measures of allergy and lung function. A pre-selection of most relevant variables using Principal Components Analysis is likely to be required to limit the number of parameters in the model. Cubic splines will be used to derive parametric curves representing the asthma phenotypes to account for the different number and timing of variables in each dataset.
The resulting models will then be tested in cross-cohort comparisons with similar longitudinal data and outcome variables. The accuracy and generalisability of models will be validated across the independent cohorts by using deviance differences.
Scientific and Medical Opportunities: The results of this study will establish whether modelling of phenotypes in complex diseases, such as asthma, is feasible across cohorts with different data acquisition methods. The ability to harmonize detailed phenotypes in consortia of large, longitudinal cohorts with complementary genetic data will increase the potential to understand how genetic and environmental influences interact to cause disease expression. Current approaches are limited to crude disease definitions. If the potential of genome wide data is to be realised to its full, large studies with comparable outcomes to enable investigation of less prevalent phenotypes will be necessary. Methods developed here will be potentially transferable to other complex, polygenic diseases.
Identification of diverse biological pathways associated with variation in disease phenotype is a key step in the identification of targets for disease modification or prevention. Some of these opportunities may be overlooked if the complexity of phenotypic heterogeneity is not accounted for in genetic and association and gene-environment interaction studies.