Modelling of high dimensional genetic and epidemiological data

Lead Research Organisation: London School of Hygiene & Tropical Medicine
Department Name: Epidemiology and Population Health

Abstract

Many medical problems boil down to an outcome (i.e. disease) and a huge number, often thousands, of possible causes (i.e. diet, a particular genotype or some combination of the two). This project aims at developing methods that will allow us to investigate the relationships between all these thousands of factors in a single analysis. We will then apply this method to two particular applications; the first trying to find which genotypes cause disease (in particular inflammatory bowel disease) and the second searching for adverse drug reactions, specifically during pregnancy.

Technical Summary

This study will develop statistical tools to analyse high dimensional datasets, containing information on thousands of different variables. Such problems arise with increasing frequency in medical research: the best known examples involve the use of ‘omics technologies and include mass spectrometric analysis of proteins or metabolites, microarray gene expression experiments and high density SNP genotyping. However, they also arise in more conventional epidemiology: here we consider the detection of adverse drug reaction using GP databases. We are interested in associations between this large number of covariates and other factors of interest, for example disease status, but are also interested in how these covariates are related to one another. By modelling these variables using a Bayesian model selection approach, we will discover the complex relationships underlying the data and estimate the uncertainty associated with our conclusions. The project will involve a mix of methodological development to find ways of fitting such models in these complex high dimensional spaces and application to two motivating examples, whole genome association studies and data mining for adverse drug reactions

Publications

10 25 50