Robust statistical method development for multivariate integrated modelling in genomic epidemiology

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Obesity is a growing public health problem as it is associated with increased risk of cardiovascular disease, type-2 diabetes and premature death. Adult weight and risk for obesity are highly heritable traits, but currently we have limited knowledge of the specific genetic risk-factors and the detailed biological mechanisms behind obesity. Understanding the complex mechanisms underlying obesity, as well as how genetic and environmental risk factors are interacting, are necessary for development of efficient prevention and treatment strategies.

The main purpose of the project is to develop new computational technologies in the form of advanced statistical methods enabling integration of biological data describing many different aspects of a biological system, so that associations between for example genes, proteins and metabolites can be characterised. This information will improve our understanding of complex diseases caused by contributions from multiple of factors; one example of such disease is obesity. The statistical methodologies that are being developed will, however, be useful in many other areas of biological and medical research as well. More specifically novel robust Bayesian statistical methodologies will be developed that are better suited for modelling of biological data, which are often noisy, and therefore problematic to model using many conventional statistical methodologies.

Technical Summary

Genetical genomics has emerged as a powerful approach for characterizing disease variants and inferring causal mechanisms underlying pathophysiology for various disorders using an integrative approach. Integrative genomic modelling of molecular phenotype data sets thus hold promise for better understanding of the mechanisms underlying complex disease.

As a canonical example, throughout my fellowship I will consider the study of obesity. Obesity is a growing public health problem as it is associated with increased risk of cardiovascular disease, type-2 diabetes and premature death. Adult weight is a highly heritable trait, but limited progress has been made towards finding specific underlying genetic risk-variants and molecular mechanisms. Understanding the complex mechanisms associated with obesity is however essential for development of efficient prevention and treatment strategies.

The work I propose to undertake follows on from recent research activities at the Wellcome Trust Center of Human Genetics and the Department of Statistics in Oxford; as part of the EU funded Molecular Phenotyping to Accelerate Genomic Epidemiology (MolPAGE) consortium. MolPAGE is focused around investigation of obesity, type-2 diabetes and associated risks for cardiovascular disease on a medium to epidemiological scale.

I propose to develop enabling technology in the form of statistical methodologies for large-scale integrative and functional genomics modelling in genomic epidemiology. This will be in the context of obesity and related metabolic disorders, utilizing the rich data resource of MolPAGE, while developed methods will be widely applicable to statistical modelling of other biological systems and complex diseases.

The main objective of the fellowship is to enable the identification of key molecular mechanisms and pathways associated with obesity through development of integrated statistical modelling. The specific aims are to develop and apply robust multivariate Bayesian inference methodologies, designed to be suitable for integrative modelling of data from multiple molecular phenotype platforms, enabling genomic-scale modelling with limited manual involvement.

Initially I will focus on robust univariate modelling, using Bayesian non-parametric models and Bayesian mixture models for robust eQTL analysis. Subsequent multivariate method development will focus on robust graphical models, including Bayesian Networks (BNs) and Structural Equations Models (SEMs). BNs provide a framework for modelling and inferring (in)dependencies between variables, while SEMs provide a complementary framework for causal and confirmatory modelling. In both of these cases robust methods insensitive to noise, which is inherent to biological systems and associate analytical platforms, are essential to ensure a minimum of spurious results, while maximizing reliability of inferred multivariate models.

Publications

10 25 50