Genetic variation, disease prediction and causation

Lead Research Organisation: University of Cambridge
Department Name: Pure Maths and Mathematical Statistics

Abstract

Recent technological advances in high-throughput genotyping and molecular phenotyping have led to the emergence of a new important field of scientific research, called genomic epidemiology, whose main aim is to characterize and understand the genetic factors and molecular pathways that affect disease risk. Genome wide studies of genetic association have started to explore the effect of DNA sequence alterations on disease susceptibility. The genetic association signals generated by these studies represent a first step towards a causal understanding of the molecular mechanisms and pathways that generate disease. The achievement of such an aim will largely depend on our ability to combine and analyze the complex datasets generated by the recent genotyping and molecular phenotyping (transcriptomics, proteomics, metabolomics, etc) platforms, and to deal with the increasing complexity of the longitudinal data accumulated in biobanks. The analysis of such data will, in particular, aim to identify causal links between pharmacologically modifiable molecular phenotypes and disease risk. Further objectives are the development of genome-wide predictors of disease and the study of interactions between genetic determinants and behavioural and social determinants of disease.

A major bottleneck in the above enterprise is the lack of fully adequate statistical tools for the purpose. We shall explore contemporary advanced statistical methods and techniques relevant to the field, including shrinkage regression methods, Markov chain Monte Carlo methods of inference, prequential methods of model selection and validation, and probabilistic causal graphical models. We shall develop these methods in the light of the difficulties arising in the analysis of our datasets. We shall apply the methods to our data to advance knowledge about the molecular mechanisms at the basis of coronary artery disease.

Technical Summary

Recent technological advances in high-throughput genotyping and molecular phenotyping have led to the emergence of a new important field of scientific research, called genomic epidemiology, whose main aim is to characterize and understand the genetic factors and molecular pathways that affect disease risk. Genome wide studies of genetic association have started to explore the effect of DNA sequence alterations on disease susceptibility. The genetic association signals generated by these studies represent a first step towards a causal understanding of the molecular mechanisms and pathways that generate disease. The achievement of such an aim will largely depend on our ability to combine and analyze the complex datasets generated by the recent genotyping and molecular phenotyping (transcriptomics, proteomics, metabolomics, etc) platforms, and to deal with the increasing complexity of the longitudinal data accumulated in biobanks. The analysis of such data will, in particular, aim to identify causal links between pharmacologically modifiable molecular phenotypes and disease risk. Further objectives are the development of genome-wide predictors of disease and the study of interactions between genetic determinants and behavioural and social determinants of disease.

A major bottleneck in the above enterprise is the lack of fully adequate statistical tools for the purpose. We shall explore contemporary advanced statistical methods and techniques relevant to the field, including shrinkage regression methods, Markov chain Monte Carlo methods of inference, prequential methods of model selection and validation, and probabilistic causal graphical models. We shall develop these methods in the light of the difficulties arising in the analysis of our datasets. We shall apply the methods to our data to advance knowledge about the molecular mechanisms at the basis of coronary artery disease.

Publications

10 25 50
 
Title R functions for the analysis of genomewide genetic association data 
Description R functions for the analysis of genomewide genetic association data 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact The package is being used by researchers outside this group 
 
Description Cellular mechanisms in Multiple Sclerosis Immunodegeneration 
Organisation University of Pavia
Department Department of Applied Health and Behavioural Psychology
Country Italy 
Sector Academic/University 
PI Contribution Development of statistical methodology for analysis of family structured genetic data, in presence of phase uncertainty. Development of software code. Data analysis. Joint paper writing.
Collaborator Contribution Access to valuable genetic data. Joint development of statistical methodology. Joint publication of scientific papers.
Impact Publications (19654877, 17534430)
 
Description Development of genomewide genetic predictors of Inflammatory Bowel Disease outcome 
Organisation Addenbrooke's Hospital
Country United Kingdom 
Sector Hospitals 
PI Contribution Statistical analysis of various datasets. Contribution to research planning, design of further data collection and experiments. Help in the interpretation of the analysis results.
Collaborator Contribution The main collaborator is Dr. Miles Parkes, of the Department of Gastroenterology. This collaboration has provided me with valuable genetic data, and has allowed me access to databases of the Wellcome Trust Case-Control Consortium
Impact Three publications (18438406,18338776,20976713)
 
Description Genome-wide analysis of coronary artery association, and Impact of 19p21.3 on the progression of coronary artery disease after a first myocardial infarction 
Organisation Broad Institute
Country United States 
Sector Charity/Non Profit 
PI Contribution Development of statistical methodology for the assessment of causal genetic effects on disease progression.
Collaborator Contribution Access to valuable data and joint writing of papers
Impact A publication (19198609)
Start Year 2007