Genetic variation, disease prediction and causation

Lead Research Organisation: University of Cambridge
Department Name: Pure Maths and Mathematical Statistics

Abstract

Recent technological advances in high-throughput genotyping and molecular phenotyping have led to the emergence of a new important field of scientific research, called genomic epidemiology, whose main aim is to characterize and understand the genetic factors and molecular pathways that affect disease risk. Genome wide studies of genetic association have started to explore the effect of DNA sequence alterations on disease susceptibility. The genetic association signals generated by these studies represent a first step towards a causal understanding of the molecular mechanisms and pathways that generate disease. The achievement of such an aim will largely depend on our ability to combine and analyze the complex datasets generated by the recent genotyping and molecular phenotyping (transcriptomics, proteomics, metabolomics, etc) platforms, and to deal with the increasing complexity of the longitudinal data accumulated in biobanks. The analysis of such data will, in particular, aim to identify causal links between pharmacologically modifiable molecular phenotypes and disease risk. Further objectives are the development of genome-wide predictors of disease and the study of interactions between genetic determinants and behavioural and social determinants of disease.

A major bottleneck in the above enterprise is the lack of fully adequate statistical tools for the purpose. We shall explore contemporary advanced statistical methods and techniques relevant to the field, including shrinkage regression methods, Markov chain Monte Carlo methods of inference, prequential methods of model selection and validation, and probabilistic causal graphical models. We shall develop these methods in the light of the difficulties arising in the analysis of our datasets. We shall apply the methods to our data to advance knowledge about the molecular mechanisms at the basis of coronary artery disease.

Technical Summary

Recent technological advances in high-throughput genotyping and molecular phenotyping have led to the emergence of a new important field of scientific research, called genomic epidemiology, whose main aim is to characterize and understand the genetic factors and molecular pathways that affect disease risk. Genome wide studies of genetic association have started to explore the effect of DNA sequence alterations on disease susceptibility. The genetic association signals generated by these studies represent a first step towards a causal understanding of the molecular mechanisms and pathways that generate disease. The achievement of such an aim will largely depend on our ability to combine and analyze the complex datasets generated by the recent genotyping and molecular phenotyping (transcriptomics, proteomics, metabolomics, etc) platforms, and to deal with the increasing complexity of the longitudinal data accumulated in biobanks. The analysis of such data will, in particular, aim to identify causal links between pharmacologically modifiable molecular phenotypes and disease risk. Further objectives are the development of genome-wide predictors of disease and the study of interactions between genetic determinants and behavioural and social determinants of disease.

A major bottleneck in the above enterprise is the lack of fully adequate statistical tools for the purpose. We shall explore contemporary advanced statistical methods and techniques relevant to the field, including shrinkage regression methods, Markov chain Monte Carlo methods of inference, prequential methods of model selection and validation, and probabilistic causal graphical models. We shall develop these methods in the light of the difficulties arising in the analysis of our datasets. We shall apply the methods to our data to advance knowledge about the molecular mechanisms at the basis of coronary artery disease.

Publications

10 25 50