Development of Bayesian methods for genetic epidemiology

Lead Research Organisation: University of Edinburgh
Department Name: School of Clinical Sciences

Abstract

Understanding how the information encoded in our genes affects our health will, in the long run, lead to new ways of preventing and treating disease. It is now possible to measure hundreds of thousands of genetic variants in people who take part in large-scale medical research projects such as the UK Biobank. The problem is how to analyse these measurements so as to learn how genes and environmental factors interact. Conventional methods of analysing data are not adequate for such complex problems. We plan to develop new ways of analysing data, drawing on methods developed in other fields of research such as physics and computer science. These methods will be applied to analyse information obtained from people who have participated in large-scale studies of genetics and health. To test these methods, we shall apply them to a study of Scottish cancer patients compared with healthy individuals, and to studies of remote isolated populations such as Orkney islanders.

Technical Summary

This project aims to tackle several problems in genetic epidemiology that are intractable to classical statistical methods, using a common set of methods and tools that exploit recent advances in machine learning, computer science and probabilistic inference.
1.Analysis of genome-wide association studies using tag SNP arrays in combination with HapMap data, to impute HapMap haplotypes and test for association at both typed and untyped loci. This will build on the program HAPMIXMAP already developed, which combines Bayesian modelling with classical hypothesis tests. To overcome the limitations of current approaches based on fixed-dimensional hidden Markov models, we plan to evaluate alternative approaches based on Dirichlet process mixtures and variational Bayes approximations.
2.Analysis of ?mendelian randomization? studies in which genotype is used as an ?instrumental variable? to infer a causal effect of an intermediate phenotype on disease risk. This will be developed using a freely-available program (JAGS) for Bayesian inference in graphical models.
3.Analysis of gene-gene interactions, gene-environment interactions and pathways through which genotype and environmental factors influence intermediate phenotypes and disease outcomes. This will use a comparatively novel approach - variational Bayes approximation combined with automatic relevance determination - to learn the structure of graphical models. Development will be based on extending the freely-available VIBES package1? for variational Bayes inference, but we shall also evaluate a commercial package (INFER.NET) that is expected to be available soon.
4.Analysis of genome-wide association studies in isolated populations, exploiting both association and haplotype sharing through cryptic relatedness. For this we plan to develop approaches based on hidden Markov Dirichlet process models, with inference using either MCMC or variational Bayes approximations.
Two different Bayesian learning algorithms will be used: Markov chain Monte Carlo simulation (MCMC) to sample the posterior of a specified model, and variational Bayes approximation with automatic relevance determination where the object is to learn model structure.

Publications

10 25 50

publication icon
Sivakumaran S (2011) Abundant pleiotropy in human complex diseases and traits. in American journal of human genetics

publication icon
McKeigue P (2019) Sample size requirements for learning to classify with high-dimensional biomarker panels. in Statistical methods in medical research

publication icon
Glodzik D (2013) Inference of identity by descent in population isolates and optimal sequencing studies. in European journal of human genetics : EJHG

 
Description MRC Stratified Medicine Initiative: Maximising Therapeutic Utility for Rheumatoid Arthritis
Amount £131,924 (GBP)
Funding ID MR/K015346/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 04/2013 
End 03/2017
 
Description SUMMIT programme on biomarkers for diabetic complications (29m euros across 23 participants)
Amount £250,000 (GBP)
Funding ID 115006 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2011 
End 02/2015
 
Title Long-range phasing" for inference of shared haplotypes in population isolates 
Description We have developed an R software package that implements the "long-range phasing" method for inference of shared haplotypes in population isolates, originally described by Kong et al. Our method improves on the original in that it does not require the genome to be partitioned into short segments, which may lose information, and does not rely on pedigree data to stitch together the inferred haplotypes 
Type Of Material Data analysis technique 
Year Produced 2010 
Provided To Others? Yes  
Impact This software has been used by our collaborators in the MRC Human Genetics Unit to select optimal subsets of individuals for exon resequencing. We are also exploring its use to detect effects of rare alleles on shared haplotypes in isolates. 
 
Title Robust Adaptive FeaTure selection (RAFT) 
Description Software for reducing a high-dimensional vector of predictive factors (such as -omics) to a lower-dimensional subset JOINTLY predictive of the outcome. 
Type Of Material Data analysis technique 
Year Produced 2011 
Provided To Others? Yes  
Impact Collaboration with groups in Pavia, Padova on the SUMMIT project and a group in WTHG Oxford for our joint analysis of mouse data. 
 
Title SParse Instrumental Variables software for inference of causal biomarkers (SPIV) 
Description This software tool can be used with genotype-phenotype data to infer causal relationships between phenotypic biomarkers and outcome. 
Type Of Material Data analysis technique 
Year Produced 2010 
Provided To Others? Yes  
Impact We expect this tool to have wide use for validating biomarkers as therapeutic targets or as surrogate endpoints in clinical trials. 
 
Description Analysis of ORCADES study and other EUROSPAN studies of population isolates 
Organisation Medical Research Council (MRC)
Department MRC Human Genetics Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution Development of methods to infer causal relationships between biomarkers and outcomes
Collaborator Contribution Access to datasets with genome-wide genotype data and extensive phenotypic biomarkers
Impact paper in press see section 2
Start Year 2008
 
Description Inference of causal effects of gene expression levels on quantitative traits in heterogeneous stocks of mice 
Organisation University of Oxford
Department Wellcome Trust Centre for Human Genetics
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We have developed methods that exploit genetic variation for inference of causality, as described elsewhere in this report
Collaborator Contribution Working closely with us on analysis and interpretation of studies of heterogeneous stocks of mice typed for multiple quantitative traits and with gene expression measurements in liver, lung and hippocampus. This has allowed us to test the SPIV program
Impact Two papers published so far in machine learning journals. A manuscript for a biomedical readership is in preparation. Disciplines involved: psychiatric genetics, statistical genetics, machine learning (a subdivision of computer science)
Start Year 2009
 
Description Pharmatics: predictions from omics data (part of MIMOmics) 
Organisation Leiden University Medical Center
Country Netherlands 
Sector Academic/University 
PI Contribution Participated in two joint FP7 applications. Advised on applications of stratified/personalized medicine.
Collaborator Contribution Advised on the choice of machine learning methods and algorithms useful for predictions from high-dimensional omics data.
Impact The collaboration has led to FP7 MIMOmics and MRC MATURA funding. Pharmatics will provide benefits in kind to MATURA.
Start Year 2011
 
Description Pharmatics: predictions from omics data (part of MIMOmics) 
Organisation Pharmatics Limited
Country United Kingdom 
Sector Private 
PI Contribution Participated in two joint FP7 applications. Advised on applications of stratified/personalized medicine.
Collaborator Contribution Advised on the choice of machine learning methods and algorithms useful for predictions from high-dimensional omics data.
Impact The collaboration has led to FP7 MIMOmics and MRC MATURA funding. Pharmatics will provide benefits in kind to MATURA.
Start Year 2011
 
Description Pharmatics: predictions from omics data (part of MIMOmics) 
Organisation University of Bologna
Country Italy 
Sector Academic/University 
PI Contribution Participated in two joint FP7 applications. Advised on applications of stratified/personalized medicine.
Collaborator Contribution Advised on the choice of machine learning methods and algorithms useful for predictions from high-dimensional omics data.
Impact The collaboration has led to FP7 MIMOmics and MRC MATURA funding. Pharmatics will provide benefits in kind to MATURA.
Start Year 2011
 
Description Pharmatics: predictions from omics data (part of MIMOmics) 
Organisation University of Hasselt
Country Belgium 
Sector Academic/University 
PI Contribution Participated in two joint FP7 applications. Advised on applications of stratified/personalized medicine.
Collaborator Contribution Advised on the choice of machine learning methods and algorithms useful for predictions from high-dimensional omics data.
Impact The collaboration has led to FP7 MIMOmics and MRC MATURA funding. Pharmatics will provide benefits in kind to MATURA.
Start Year 2011
 
Title Sparse Latent Inverse Covariance Estimation (SLICE) 
Description We have developed an algorithm for modelling sparse covariance structures that include both observed and unobserved variables. This has wide application not only in biomedical research but in other fields such as financial modellling. We have initiated consultation within our institute on what forms of intellectual property protection (apart from copyright on software) may be appropriate 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted 2011
Licensed No
Impact Invitations to present the results at industry meetings. A non-profit pilot project with a subsidiary of a FTSE100-listed company using their proprietary dataset.
 
Company Name Pharmatics Ltd 
Description Pharmatics is a startup company developing intelligent software and providing data mining services to industries. We are developing software that can be used to validate causal pathways that may contain therapeutic targets, and identify biomarkers that can be used as surrogate end-points in Phase 2 trials. We are also developing tools for prediction of individual response from high-dimensional -omic measurements. More details are given at www.pharmaticsltd.co.uk 
Year Established 2011 
Impact Pharmatics is a young RnD company. It is working on a pilot project with a company listed on FTSE100. In July 2011, Pharmatics was announced the overall winner of BioQuarter Innovation Competition 2011, showing a potential for becoming one of the most promising young bio-medical companies in Scotland.
Website http://www.pharmaticsltd.com/
 
Description Industry-2011 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Primary Audience Public/other audiences
Results and Impact Four presentations to different groups of industry members, including the heads of UK-based RnD departments of two companies listed on FTSE100.

An initiated pilot study involving a subsidiary of one of the FTSE100 companies -- further details cannot be disclosed at this stage due to an NDA.
Year(s) Of Engagement Activity 2011
 
Description Pharmatics-2011 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? Yes
Primary Audience Public/other audiences
Results and Impact Press conference response to local media (including the Scotsman and various online resources) that followed the announcement that out pre-spin out company, Pharmatics' Ltd, was declared the overall winner of 2012 BioQuarter Innovation Competition.

Inquiries from a London-based venture capitalist. Inquiries from a German-based research group regarding Pharmatics' interest in becoming an SME partner in an EU grant application on machine learning for personalized medicine. Participation in 2 submitted EU-FP7 grant applications.
Year(s) Of Engagement Activity 2011
 
Description Pleiotropy-2011 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Primary Audience Media (as a channel to the public)
Results and Impact The work on genetic similarities between complex traits (pleiotropy) quantified by using probabilistic methods (our AJHG paper) was covered in national media, including The Times (12/2011), The Herald (12/2011), The Scotsman (12/2011), and the BBC website.

Enquiries from general public; enquiries from researchers specializing in quantitative disciplines and interested in starting collaborations. Contact from Nature Review Genetics to highlight the result as an important contribution.
Year(s) Of Engagement Activity 2011