Managing and exploiting high dimensionality in genetic epidemiology

Lead Research Organisation: London School of Hygiene & Tropical Medicine

Department Name: Epidemiology and Population Health

Abstract

In the last five years there has been a great deal of progress in discovering the genetic variations that explain differences in risk for complex diseases such as diabetes, heart disease, and many cancers. But it is clear that much remains to be discovered, and in order to explain this so-called missing heritability researchers are conducting increasingly larger scale studies, enrolling greater numbers of patients and including much larger numbers of genetic variations. At the same time, there are increased efforts to use these new genetic discoveries to learn about the biological processes involved in disease. These new studies in the field of genetic epidemiology need specialised statistical methods to analyse their data, as the path from gene to disease is complex and subject to apparently random influences. This research aims to develop powerful methods that allow for, and take advantage of, the large number of genes measured in a typical study. One important aspect is statistical significance: when many genes are studied, some will appear to be associated with disease simply due to chance fluctuations in the data. Standard guidelines have been given for existing studies in order to differentiate true associations from chance effects, and we will extend these recommendations for the next generation of studies that consider a larger number of genetic variants, many of them rare, and which study the range of populations worldwide. We will then study the best way to analyse new genotyping products that are designed to obtain enhanced information for specific classes of disease such as auto-immune, cancer and psychiatric conditions. We will also study an emerging application of genetics, called Mendelian randomisation, which can help to explain whether an observed association between an exposure such as alcohol intake and an outcome such as heart disease is due to a true causal effect or only to other common causes. We will consider how large numbers of genetic variants can be combined into a single tool that can improve the resolution and power of this method. Taken together, this research will provide a methodological basis for gaining further insights from large scale genetic studies.

Technical Summary

A salient characteristic of genetic epidemiology in recent years has been the high dimensionality of studies in terms of the number of genetic variants assayed, phenotypic outcomes analysed, and sample sizes attained. This leads for example to increased possibilities for false positive findings and in selection bias when focusing on the most significant results. However, high dimensionality can be usefully exploited by pooling information across multiple genes or phenotypes, which can improve power by means such as aggregating evidence across a biological pathway, estimating the total heritability explained by a genomewide association scan (GWAS), or eliciting stronger instruments for Mendelian randomisation (MR) analyses. The next generation of genetic epidemiology studies can be broadly dichotomised into 1) continued disease mapping efforts using larger sample sizes and complete DNA sequence data, focussing on rare mutations and structural variants; 2) efforts to gain insight from GWAS results by relating DNA variation to biological function. High dimensionality will continue to be a key feature in both cases. This research will address high dimensionality in both cases, which will continue to be a key feature. We will address genomewide significance for whole genome and exome sequencing studies, in diverse worldwide populations. We will develop improved methods for analysing the new generation of disease specific genotyping chips, which are optimised for the study of particular classes of disease, using empirical Bayes optimal discovery procedures. We will develop improved methods for analysing standard GWAS, including more powerful two-stage approaches within a single data set, and proper reporting of significance levels from replication studies. Finally we will study the use of whole genome instruments in Mendelian randomisation, by using shrinkage methods within two-stage least squares and inferring non-linear causal effects from observational data.

Planned Impact

The immediate beneficiaries will be a wide range of academic researchers performing research in genetic epidemiology. Beyond this immediate application, this research will benefit basic scientists aiming to translate genomics-era discoveries into new clinical treatments and healthcare policies. The pharmaceutical industry will also benefit in its efforts to translate genomics into healthcare. There is also a high demand for training in genetic epidemiology, both from career geneticists and non-specialists, and this research will benefit this training both by advancing the state of knowledge in the field and more indirectly by ensuring that leading edge training and information exchange can be maintained by the applicants. In the longer term this research will contribute, in a technical but necessary way, to improvements to public health through the exploitation of genetic knowledge via better understanding of disease pathways, applications to personalised medicine and clearer understanding of the causal role of environmental risk factors in common diseases.

Funded Value:

£384,766

Funded Period:

Aug 13 - Jul 17

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/K006215/1

Principal Investigator:

Frank Dudbridge

Health Category:

Unclassified

Organisations

People	ORCID iD
Frank Dudbridge (Principal Investigator)
John Whittaker (Researcher)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 > >|

10 25 50

Burgess S (2016) Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. in Statistics in medicine

Burgess S (2015) Re: "Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects". in American journal of epidemiology

Charoen P (2016) Mendelian Randomisation study of the influence of eGFR on coronary heart disease. in Scientific reports

Costa GN (2015) A genome-wide association study of asthma symptoms in Latin American children. in BMC genetics

Dryden NH (2014) Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. in Genome research

Dudbridge F (2015) Accuracy of Gene Scores when Pruning Markers by Linkage Disequilibrium. in Human heredity

Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. in PLoS genetics

Dudbridge F (2018) How many cases of disease in a pedigree imply familial disease? in Annals of human genetics

Dudbridge F (2016) Polygenic Epidemiology. in Genetic epidemiology

Dudbridge F (2018) Predictive accuracy of combined genetic and environmental risk scores. in Genetic epidemiology

Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Title	Polygenic score
Description	Methods for calculating power and predictive accuracy of polygenic risk scores
Type Of Material	Model of mechanisms or symptoms - human
Year Produced	2013
Provided To Others?	Yes
Impact	Invitations to present methods to international research institutes
URL	https://sites.google.com/site/fdudbridge/software/polygenescore.R


Description	CEU
Organisation	University of Cambridge
Country	United Kingdom
Sector	Academic/University
PI Contribution	Statistical methods for genetic epidemiology
Collaborator Contribution	Research priorities in statistical genetics
Impact	None
Start Year	2016


Description	ICR
Organisation	Institute of Cancer Research UK
Country	United Kingdom
Sector	Academic/University
PI Contribution	Statistical analysis and development of methods
Collaborator Contribution	Data generation and project management
Impact	Publications
Start Year	2010


Description	IEU
Organisation	University of Bristol
Department	MRC Integrative Epidemiology Unit
Country	United Kingdom
Sector	Academic/University
PI Contribution	Development of statistical methods for causal inference
Collaborator Contribution	Development of statistical methods for causal inference
Impact	None
Start Year	2016


Description	PGC
Organisation	Psychiatric GWAS Consortium (PGC)
Country	Global
Sector	Academic/University
PI Contribution	Statistical advice
Collaborator Contribution	Data analysis
Impact	Publications
Start Year	2009


Description	UWA
Organisation	University of Western Australia
Country	Australia
Sector	Academic/University
PI Contribution	Expertise in statistical genetics
Collaborator Contribution	Genetic studies of osteoporosis and thyroid disease
Impact	Several publications


Title	AVENGEME
Description	Estimation of genetic model parameters from polygenic association statistics
Type Of Technology	Software
Year Produced	2015
Open Source License?	Yes
Impact	Used as a research tool by a number of independent groups.
URL	http://sites.google.com/site/fdudbridge/software


Description	International Innovation
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Type Of Presentation	Keynote/Invited Speaker
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Interview for a magazine presenting recent scientific advances to an informed lay readership Invitations to present work to international research institutes
Year(s) Of Engagement Activity	2013
URL	https://sites.google.com/site/fdudbridge/software/doc/RMfeature.pdf