Improving Mendelian randomisation based on meta-GWAS summary statistics.
Lead Research Organisation:
University of Cambridge
Department Name: MRC Biostatistics Unit
Abstract
Observational studies, in which data are measured in a sample of individuals for various disease traits and characteristics of interest, are widely used in epidemiology to seek out potential drivers of disease. Ideally this would lead to new treatments or public health interventions to target any modifiable risk factors identified. However, due to the observational nature of these studies it is often unclear whether the risk factors identified are truly causal (so that intervening would be effective), or whether an unmeasured process is truly driving the observed association with disease (making intervention ineffective). A classic example is yellow teeth and lung cancer; these would appear associated in an observational study but the true driver, which associates with both, is smoking. Teeth whitening would not reduce an individual's cancer risk. Traditionally, the gold-standard method for determining causality is a randomised trial for the intervention of interest. Random assignment of the intervention breaks any associations underlying processes, both known and unknown, allowing an evaluation of the direct and unbiased effect. However, clinical trials are very costly, may be impractical, and for some risk factors, such as smoking, are unethical.
Mendelian randomization (MR), a relatively recent idea, offers a framework for assessing the causality of exposures using observational data by exploiting genetic randomisation that occurs during sexual reproduction. Genetic variants which modulate the exposure of interest, rather than the observed risk factor itself, are used to test for disease associations. These are randomised at conception and so, providing the biology is well understood, may be considered analogous to an intervention in a randomised clinical trial for the exposure. A recent surge in MR analyses has been driven by the publication of results from a number of large 'meta-GWAS' (Genome wide association studies), in which multiple studies combine tens of thousands of individuals into powerful joint analyses. These have revealed powerful genetic signatures for a variety of exposures and risk factors of interest, which are being used under MR to ask new causal questions in existing observational datasets.
Leveraging results from these powerful consortium meta-GWAS will be key to many MR efforts. However, meta-GWAS typically only publish results in a reduced summarised form, and the existing MR frameworks for integrating summary statistics such as these have a number of shortcomings. For example, genetic variants that are correlated with one another, of which there are many, cannot be included. Furthermore, better performing methods are required to search these huge and genome-wide summary data repositories in order to identify optimal genetic signatures for use in MR. For a given MR analysis, we require enough variants to predict the target exposure but must avoid including variants which influence the outcome through other means, such as alternative genetic pathways.
In this project we aim to advance current MR frameworks to make better use of summary data from large meta-GWAS. We will explore re-purposing and building upon summary statistics methods developed elsewhere in statistical genetics, and developing critical extensions to the recently proposed MR-Egger summary statistic approach, which offers robustness to violations in the biological assumptions necessary for MR. All methods will be developed and tested to explore causal disease drivers for type 2 Diabetes, Alzheimer's disease and a range of cancers in four compelling case studies shared by our colleagues at the MRC Epidemiology unit.
Mendelian randomization (MR), a relatively recent idea, offers a framework for assessing the causality of exposures using observational data by exploiting genetic randomisation that occurs during sexual reproduction. Genetic variants which modulate the exposure of interest, rather than the observed risk factor itself, are used to test for disease associations. These are randomised at conception and so, providing the biology is well understood, may be considered analogous to an intervention in a randomised clinical trial for the exposure. A recent surge in MR analyses has been driven by the publication of results from a number of large 'meta-GWAS' (Genome wide association studies), in which multiple studies combine tens of thousands of individuals into powerful joint analyses. These have revealed powerful genetic signatures for a variety of exposures and risk factors of interest, which are being used under MR to ask new causal questions in existing observational datasets.
Leveraging results from these powerful consortium meta-GWAS will be key to many MR efforts. However, meta-GWAS typically only publish results in a reduced summarised form, and the existing MR frameworks for integrating summary statistics such as these have a number of shortcomings. For example, genetic variants that are correlated with one another, of which there are many, cannot be included. Furthermore, better performing methods are required to search these huge and genome-wide summary data repositories in order to identify optimal genetic signatures for use in MR. For a given MR analysis, we require enough variants to predict the target exposure but must avoid including variants which influence the outcome through other means, such as alternative genetic pathways.
In this project we aim to advance current MR frameworks to make better use of summary data from large meta-GWAS. We will explore re-purposing and building upon summary statistics methods developed elsewhere in statistical genetics, and developing critical extensions to the recently proposed MR-Egger summary statistic approach, which offers robustness to violations in the biological assumptions necessary for MR. All methods will be developed and tested to explore causal disease drivers for type 2 Diabetes, Alzheimer's disease and a range of cancers in four compelling case studies shared by our colleagues at the MRC Epidemiology unit.
Technical Summary
Mendelian randomization (MR) provides a framework for assessing causal relationships in observational data by exploiting the natural randomization of genetic variants at birth. Unconfounded genetic proxies, or instruments, are constructed from variants known to predict the exposure of interest; typically single nucleotide polymorphisms, or SNPs. A recent surge in MR has been fuelled by the publication of summary statistics from a variety of high powered large-scale meta-GWAS (meta-analyses of genome wide association studies); which amass data on tens of thousands of individuals. However, leveraging these hugely valuable resources is currently hampered by a number of issues around the use of summary statistics in MR.
The first issue arises due to the significant correlations that exist in the human genome. Weighted scores are often used in MR to aggregate multiple SNPs into a single instrument for improved power. Correlated SNPs require weights from multivariate regressions, however, meta-GWAS usually only publish summaries from one-at-time tests. We will explore the application of existing methods for imputing multivariate effects from one-at-a-time summary statistics for the first time in the context of MR. This will include extending a Bayesian variable selection algorithm to incorporate a loss function for pleiotropy (when alternative pathways exist between SNPs and the outcome invalidating MR assumptions).
We will also explore a range of extensions to the summary statistic MR-Egger regression method, a tool to detect small study bias in meta-analysis which was recently adapted to test and adjust for bias from pleiotropy in the context of MR. We will extend this robust inference method for use with correlated SNPs, case-control data, and to multivariable MR, where several correlated risk factors are modeled simultaneously.
Methods will be developed and demonstrated in a range of four topical case studies shared by collaborators at the MRC Epidemiology unit.
The first issue arises due to the significant correlations that exist in the human genome. Weighted scores are often used in MR to aggregate multiple SNPs into a single instrument for improved power. Correlated SNPs require weights from multivariate regressions, however, meta-GWAS usually only publish summaries from one-at-time tests. We will explore the application of existing methods for imputing multivariate effects from one-at-a-time summary statistics for the first time in the context of MR. This will include extending a Bayesian variable selection algorithm to incorporate a loss function for pleiotropy (when alternative pathways exist between SNPs and the outcome invalidating MR assumptions).
We will also explore a range of extensions to the summary statistic MR-Egger regression method, a tool to detect small study bias in meta-analysis which was recently adapted to test and adjust for bias from pleiotropy in the context of MR. We will extend this robust inference method for use with correlated SNPs, case-control data, and to multivariable MR, where several correlated risk factors are modeled simultaneously.
Methods will be developed and demonstrated in a range of four topical case studies shared by collaborators at the MRC Epidemiology unit.
Planned Impact
The aims of this research are primarily to increase the efficiency with which summarized data from large scale meta-GWAS are utilized in Mendelian randomisation, and hence improve the likelihood that researchers are able to leverage these vast datasets to reveal or rule out putative causal relationships.
More broadly, we hope the research will benefit:
- The pharmaceutical industry, which (like academic researchers) use Mendelian Randomisation to inform drug development - both in support for or against new molecules, and to explore the possibility of adverse events;
- Policy makers (e.g. NICE) who may utilize evidence from evaluations of interventions in observational studies under Mendelian randomisation (e.g. for adverse events);
- Researchers in other areas of social science and public health who utilise Mendelian randomisation. Genetic instrument approaches have been used in many areas such as genetic predictors of educational achievement and IQ, and the effect of BMI on depression and earnings.
- Statisticians and researchers beyond the health field who work with summary statistics. Some of the summary statistics problems we will explore (e.g. making multivariate inference from marginal associations) are generic and are likely to occur in a range of contexts;
Indirectly, we believe the research can ultimately benefit clinicians and patients by improving power to detect new therapeutic targets for intervention, and to detect possible adverse events of proposed drugs before they are trialled in humans.
More broadly, we hope the research will benefit:
- The pharmaceutical industry, which (like academic researchers) use Mendelian Randomisation to inform drug development - both in support for or against new molecules, and to explore the possibility of adverse events;
- Policy makers (e.g. NICE) who may utilize evidence from evaluations of interventions in observational studies under Mendelian randomisation (e.g. for adverse events);
- Researchers in other areas of social science and public health who utilise Mendelian randomisation. Genetic instrument approaches have been used in many areas such as genetic predictors of educational achievement and IQ, and the effect of BMI on depression and earnings.
- Statisticians and researchers beyond the health field who work with summary statistics. Some of the summary statistics problems we will explore (e.g. making multivariate inference from marginal associations) are generic and are likely to occur in a range of contexts;
Indirectly, we believe the research can ultimately benefit clinicians and patients by improving power to detect new therapeutic targets for intervention, and to detect possible adverse events of proposed drugs before they are trialled in humans.
Organisations
Publications
Burgess S
(2018)
Modal-based estimation via heterogeneity-penalized weighting: model averaging for consistent and efficient estimation in Mendelian randomization when a plurality of candidate instruments are valid.
in International journal of epidemiology
Gkatzionis A
(2019)
Contextualizing selection bias in Mendelian randomization: how bad is it likely to be?
in International journal of epidemiology