Managing and exploiting high dimensionality in genetic epidemiology
Lead Research Organisation:
London Sch of Hygiene and Trop Medicine
Department Name: Epidemiology and Population Health
Abstract
In the last five years there has been a great deal of progress in discovering the genetic variations that explain differences in risk for complex diseases such as diabetes, heart disease, and many cancers. But it is clear that much remains to be discovered, and in order to explain this so-called missing heritability researchers are conducting increasingly larger scale studies, enrolling greater numbers of patients and including much larger numbers of genetic variations. At the same time, there are increased efforts to use these new genetic discoveries to learn about the biological processes involved in disease. These new studies in the field of genetic epidemiology need specialised statistical methods to analyse their data, as the path from gene to disease is complex and subject to apparently random influences. This research aims to develop powerful methods that allow for, and take advantage of, the large number of genes measured in a typical study. One important aspect is statistical significance: when many genes are studied, some will appear to be associated with disease simply due to chance fluctuations in the data. Standard guidelines have been given for existing studies in order to differentiate true associations from chance effects, and we will extend these recommendations for the next generation of studies that consider a larger number of genetic variants, many of them rare, and which study the range of populations worldwide. We will then study the best way to analyse new genotyping products that are designed to obtain enhanced information for specific classes of disease such as auto-immune, cancer and psychiatric conditions. We will also study an emerging application of genetics, called Mendelian randomisation, which can help to explain whether an observed association between an exposure such as alcohol intake and an outcome such as heart disease is due to a true causal effect or only to other common causes. We will consider how large numbers of genetic variants can be combined into a single tool that can improve the resolution and power of this method. Taken together, this research will provide a methodological basis for gaining further insights from large scale genetic studies.
Technical Summary
A salient characteristic of genetic epidemiology in recent years has been the high dimensionality of studies in terms of the number of genetic variants assayed, phenotypic outcomes analysed, and sample sizes attained. This leads for example to increased possibilities for false positive findings and in selection bias when focusing on the most significant results. However, high dimensionality can be usefully exploited by pooling information across multiple genes or phenotypes, which can improve power by means such as aggregating evidence across a biological pathway, estimating the total heritability explained by a genomewide association scan (GWAS), or eliciting stronger instruments for Mendelian randomisation (MR) analyses. The next generation of genetic epidemiology studies can be broadly dichotomised into 1) continued disease mapping efforts using larger sample sizes and complete DNA sequence data, focussing on rare mutations and structural variants; 2) efforts to gain insight from GWAS results by relating DNA variation to biological function. High dimensionality will continue to be a key feature in both cases. This research will address high dimensionality in both cases, which will continue to be a key feature. We will address genomewide significance for whole genome and exome sequencing studies, in diverse worldwide populations. We will develop improved methods for analysing the new generation of disease specific genotyping chips, which are optimised for the study of particular classes of disease, using empirical Bayes optimal discovery procedures. We will develop improved methods for analysing standard GWAS, including more powerful two-stage approaches within a single data set, and proper reporting of significance levels from replication studies. Finally we will study the use of whole genome instruments in Mendelian randomisation, by using shrinkage methods within two-stage least squares and inferring non-linear causal effects from observational data.
Planned Impact
The immediate beneficiaries will be a wide range of academic researchers performing research in genetic epidemiology. Beyond this immediate application, this research will benefit basic scientists aiming to translate genomics-era discoveries into new clinical treatments and healthcare policies. The pharmaceutical industry will also benefit in its efforts to translate genomics into healthcare. There is also a high demand for training in genetic epidemiology, both from career geneticists and non-specialists, and this research will benefit this training both by advancing the state of knowledge in the field and more indirectly by ensuring that leading edge training and information exchange can be maintained by the applicants. In the longer term this research will contribute, in a technical but necessary way, to improvements to public health through the exploitation of genetic knowledge via better understanding of disease pathways, applications to personalised medicine and clearer understanding of the causal role of environmental risk factors in common diseases.
Organisations
- London Sch of Hygiene and Trop Medicine, United Kingdom (Lead Research Organisation)
- University College London, United Kingdom (Collaboration)
- Institute of Cancer Research UK (Collaboration)
- Psychiatric GWAS Consortium (PGC) (Collaboration)
- University of Cambridge (Collaboration)
- University of Bristol, United Kingdom (Collaboration)
- University of Western Australia, Australia (Collaboration)
Publications

Burgess S
(2015)
Re: "Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects".
in American journal of epidemiology

Burgess S
(2016)
Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods.
in Statistics in medicine

Charoen P
(2016)
Mendelian Randomisation study of the influence of eGFR on coronary heart disease.
in Scientific reports

Costa GN
(2015)
A genome-wide association study of asthma symptoms in Latin American children.
in BMC genetics

Dryden NH
(2014)
Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C.
in Genome research

Dudbridge F
(2015)
Accuracy of Gene Scores when Pruning Markers by Linkage Disequilibrium.
in Human heredity

Dudbridge F
(2013)
Power and predictive accuracy of polygenic risk scores.
in PLoS genetics

Dudbridge F
(2016)
Commentary: Tobacco consumption and body weight: Mendelian randomization across a range of exposure.
in International journal of epidemiology

Dudbridge F
(2018)
Predictive accuracy of combined genetic and environmental risk scores.
in Genetic epidemiology

Dudbridge F
(2016)
Polygenic Epidemiology.
in Genetic epidemiology
Title | Polygenic score |
Description | Methods for calculating power and predictive accuracy of polygenic risk scores |
Type Of Material | Model of mechanisms or symptoms - human |
Year Produced | 2013 |
Provided To Others? | Yes |
Impact | Invitations to present methods to international research institutes |
URL | https://sites.google.com/site/fdudbridge/software/polygenescore.R |
Description | CEU |
Organisation | University of Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Statistical methods for genetic epidemiology |
Collaborator Contribution | Research priorities in statistical genetics |
Impact | None |
Start Year | 2016 |
Description | ICR |
Organisation | Institute of Cancer Research UK |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Statistical analysis and development of methods |
Collaborator Contribution | Data generation and project management |
Impact | Publications |
Start Year | 2010 |
Description | IEU |
Organisation | University of Bristol |
Department | MRC Integrative Epidemiology Unit |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Development of statistical methods for causal inference |
Collaborator Contribution | Development of statistical methods for causal inference |
Impact | None |
Start Year | 2016 |
Description | PGC |
Organisation | Psychiatric GWAS Consortium (PGC) |
Country | Global |
Sector | Academic/University |
PI Contribution | Statistical advice |
Collaborator Contribution | Data analysis |
Impact | Publications |
Start Year | 2009 |
Description | UCLEB |
Organisation | University College London |
Department | Research Department of Epidemiology and Public Health |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Statistical analysis and advice |
Collaborator Contribution | Data collection, project management |
Impact | Fine mapping of genetic loci for cardiovascular outcomes |
Start Year | 2011 |
Description | UWA |
Organisation | University of Western Australia |
Country | Australia |
Sector | Academic/University |
PI Contribution | Expertise in statistical genetics |
Collaborator Contribution | Genetic studies of osteoporosis and thyroid disease |
Impact | Several publications |
Title | AVENGEME |
Description | Estimation of genetic model parameters from polygenic association statistics |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | Used as a research tool by a number of independent groups. |
URL | http://sites.google.com/site/fdudbridge/software |
Description | International Innovation |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Type Of Presentation | Keynote/Invited Speaker |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Interview for a magazine presenting recent scientific advances to an informed lay readership Invitations to present work to international research institutes |
Year(s) Of Engagement Activity | 2013 |
URL | https://sites.google.com/site/fdudbridge/software/doc/RMfeature.pdf |