Development of Bayesian methods for genetic epidemiology

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Community Health Sciences

Abstract

Understanding how the information encoded in our genes affects our health will, in the long run, lead to new ways of preventing and treating disease. It is now possible to measure hundreds of thousands of genetic variants in people who take part in large-scale medical research projects such as the UK Biobank. The problem is how to analyse these measurements so as to learn how genes and environmental factors interact. Conventional methods of analysing data are not adequate for such complex problems. We plan to develop new ways of analysing data, drawing on methods developed in other fields of research such as physics and computer science. These methods will be applied to analyse information obtained from people who have participated in large-scale studies of genetics and health. To test these methods, we shall apply them to a study of Scottish cancer patients compared with healthy individuals, and to studies of remote isolated populations such as Orkney islanders.

Technical Summary

This project aims to tackle several problems in genetic epidemiology that are intractable to classical statistical methods, using a common set of methods and tools that exploit recent advances in machine learning, computer science and probabilistic inference.
1.Analysis of genome-wide association studies using tag SNP arrays in combination with HapMap data, to impute HapMap haplotypes and test for association at both typed and untyped loci. This will build on the program HAPMIXMAP already developed, which combines Bayesian modelling with classical hypothesis tests. To overcome the limitations of current approaches based on fixed-dimensional hidden Markov models, we plan to evaluate alternative approaches based on Dirichlet process mixtures and variational Bayes approximations.
2.Analysis of ?mendelian randomization? studies in which genotype is used as an ?instrumental variable? to infer a causal effect of an intermediate phenotype on disease risk. This will be developed using a freely-available program (JAGS) for Bayesian inference in graphical models.
3.Analysis of gene-gene interactions, gene-environment interactions and pathways through which genotype and environmental factors influence intermediate phenotypes and disease outcomes. This will use a comparatively novel approach - variational Bayes approximation combined with automatic relevance determination - to learn the structure of graphical models. Development will be based on extending the freely-available VIBES package1? for variational Bayes inference, but we shall also evaluate a commercial package (INFER.NET) that is expected to be available soon.
4.Analysis of genome-wide association studies in isolated populations, exploiting both association and haplotype sharing through cryptic relatedness. For this we plan to develop approaches based on hidden Markov Dirichlet process models, with inference using either MCMC or variational Bayes approximations.
Two different Bayesian learning algorithms will be used: Markov chain Monte Carlo simulation (MCMC) to sample the posterior of a specified model, and variational Bayes approximation with automatic relevance determination where the object is to learn model structure.

Publications

10 25 50