Dissecting the molecular aetiology of complex traits using high dimensional omic data

Lead Research Organisation: University of Bristol
Department Name: Social Medicine

Abstract

The discovery of genetic variants associated with complex traits is increasing at an exponential rate. It is now of vital importance to develop our understanding of the molecular mechanisms which can help explain these findings, in order to improve our capability to prevent and treat disease. Advancements in high-throughput sequencing technologies present an unprecedented opportunity to address this challenge and ascertain the biological and clinical relevance of results from genome-wide association studies (GWAS). However, there is an increasing abundance of data being generated on diverse types of molecular "omic" traits, accompanied by the rapid development of partially overlapping and often untested methodologies. To overcome this challenge, there needs to be focused research into the most appropriate and efficient manner to harness large-scale 'omic data to elucidate the molecular determinants of complex disease.

The research outlined in this fellowship proposal can be delineated into five categories, with the overall aim of harnessing large-scale data to improve patient healthcare in-line with the UK's industrial strategy. The opportunity presented by HDR-UK will allow me to address some of the most crucial limitations in molecular aetiology. Specifically, there needs to be extensive research into tissue-specificity for 'omic traits, systematic frameworks to appropriately appraise molecular mediation and methods to improve causal inference in this paradigm. I also intend on applying novel, state-of-art-methods to available 'omic data to elucidate findings which have translational value for therapeutic evaluation. Finally, I will build publicly accessible computational tools to automate fundamental analyses in this paradigm and resources to disseminate the findings of this fellowship.

Using health informatics to harness large-scale, high throughput data to develop our understanding of the causal pathway from genetic variation to complex disease is the overarching theme of this research project. This research will lead to several high impact publications to improve our understanding of the molecular determinants of disease, as well as web tools that should help disseminate the product of this project and also assist colleagues with their endeavors in this field. As such, this work most closely aligns with the HDR-UK priorities concerning health informatics and accelerating medicines discovery.

Technical Summary

1. Tissue-specific Mendelian randomization - As an initial analysis, I will instrument genes using tissue-specific expression quantitative trait loci (eQTL) and assess putative associations using Mendelian randomization (MR) across 139 complex traits. However, as more tissue-centric expression data becomes available I will be able to assess this crucial aspect of molecular aetiology with greater confidence by instrumenting genes with multiple instruments. I also plan to expand upon the initial set of traits assessed using ~600 outcomes from the UK Biobank study as well as proteomic data.

2. Systematic two-step epigenetic MR - Building upon recent research I have led, I intend to construct a framework to assess whether genetic and environmental exposures influence complex traits via changes in DNA methylation levels. To do this I will systematically apply two-step epigenetic MR.

3. Multivariable 'omic MR - I will apply the principals of multivariable MR to assess the effect of multiple 'omic traits simultaneously on complex traits/disease. This work will also evaluate pharmaceutical targets which should be translatable for therapeutic purposes.

4. Application of novel methods to 'omic data - Future methodology in this field is likely to focus on integrative approaches, and therefore I intend to apply the most innovative and cutting-edge methods to multiple types of 'omic data, as well as iterate upon them.

5. Computational tools to disseminate the product of this fellowship - I will develop a suite of bioinformatics tools and resources to disseminate the findings and methods of this fellowship to a broad audience. Building upon experience in developing a target validation tool for industry partners, I plan to develop platforms which automate fundamental analyses in this field to large-scale data which should be of considerable value to the community and industry partners.

Publications

10 25 50

publication icon
Harroud A (2021) Childhood obesity and multiple sclerosis: A Mendelian randomization study. in Multiple sclerosis (Houndmills, Basingstoke, England)

publication icon
Kazmi N (2019) Hypertensive Disorders of Pregnancy and DNA Methylation in Newborns. in Hypertension (Dallas, Tex. : 1979)

 
Title An atlas of genetically predicted effects to dissect tissue-specific transcritomic mechanisms across the human phenome 
Description This web application can be used to investigate associations between genome-wide gene expression and 395 complex traits by applying Mendelian randomization and genetic colocalization. Analyses have been undertaken using gene expression derived from whole blood made available by the eQTLGen consortium (n=31,684), as well as 48 different tissue types from the GTEx project. Findings from this web application can help uncover associations yet to be detected by genome-wide association studies and also investigate tissue-specific effects between gene expression and complex traits. 
Type Of Material Model of mechanisms or symptoms - human 
Year Produced 2019 
Provided To Others? Yes  
Impact This web resource accompanies a publication in Nature Communications: https://www.nature.com/articles/s41467-019-13921-9 This work has at the time of submission responded to 5,396 queries, where each query is either a transcriptome-wide association study, a phenome-wide association study or a cross-tissue comparison. Furthermore, at the European Society of Human Genetics 2019 in Gothenburg, Sweden I presented this research which resulted in me winning the best statistical genetics talk at the conference. 
URL http://mrcieu.mrsoftware.org/Tissue_MR_atlas/
 
Title An atlas of polygenic risk score association across the human phenome 
Description This web tool can be used to investigate genetic predisposition to disease for 162 traits/outcomes and evaluate their association with 551 different health measurements. In doing so user can evaluate putative causal relationships between risk factors and disease. 
Type Of Material Model of mechanisms or symptoms - human 
Year Produced 2018 
Provided To Others? Yes  
Impact This tool has currently had over 5000 queries to date, which I believe will results in many citations and benefit the research of many future studies/publications. More broadly, I believe this tool will help us understand how risk factors influence disease risk, therefore improving our capability to prevent and treat disease. 
URL http://mrcieu.mrsoftware.org/PRS_atlas/
 
Title Supporting data for "PhenoSpD: an integrated toolkit for phenotypic correlation es-timation and multiple testing correction using GWAS summary statistics" 
Description Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restrict-ed access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using only genome-wide association study (GWAS) summary results.
Here, we present an integrated R toolkit, PhenoSpD, to 1) use LD score regression to estimate phenotypic correlations using GWAS summary statistics; and 2) utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest 1) it is pos-sible to identify non-independence of phenotypes using samples with partial overlap, as overlap decreases the estimated phenotypic correlations will attenuate towards zero and multiple testing correction will be more stringent than in perfectly overlapping samples; 2) in contrast to LD score regression, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application for multiple testing correction. In a case study, PhenoSpD using UK Biobank GWAS results suggested 399.6 independent tests among 487 human traits, which is close to the 352.4 inde-pendent tests estimated using true phenotypic correlation. We further applied PhenoSpD to an estimated 5618 pair-wise phenotypic correlations among 107 metabolites using GWAS summary statistics from Kettunen et al. and PhenoSpD suggested the equivalent of 33.5 independent tests for theses metabolites.
PhenoSpD extends the use of summary level results, providing a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics. This is particularly valuable in the age of large-scale biobank and consortia studies, where GWAS results are much more accessible than individual-level data.
R code and documentation for PhenoSpD V1.0.0 is available online https://github.com/MRCIEU/PhenoSpD. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL http://gigadb.org/dataset/100474
 
Description Sanofi-Bristol collaboration 
Organisation Sanofi
Country Global 
Sector Private 
PI Contribution We recently were awarded a 2 year extension to this collaboration between ourselves at the MRC Integrative Epidemiology Unit in Bristol and the pharmaceutical Sanofi. I have also line managed a senior research associate at the MRC IEU who has been funded by this collaboration.
Collaborator Contribution I lead this collaboration along with my colleagues, using the findings from my research to inform endeavours concerning drug validation.
Impact The value of this extension is 191,734 euros per annum (for 2 years). Moreover, I will begin line managing a new senior research associate attached to the collaboration for the following 2 years.
Start Year 2016
 
Description Plenary presentation at the American Society of Human Genetics (ASHG) 2018 in San Diego 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the ASHG 2018 conference in San Diego I presented my research concerning an atlas of polygenic risk scores which is now published (https://www.ncbi.nlm.nih.gov/pubmed/30835202). This was a late breaking abstract, where according to this website only 3 were accepted from 57 applications (http://www.ashg.org/2018meeting/pages/abstracts_late.shtml). ASHG is (I believe) the biggest genetics conference in the world and the audience I presented to was estimated at around 5,000. This was a fantastic opportunity to promote my research and is likely a major factor in the popularity of the web tool for this work (available at http://mrcieu.mrsoftware.org/PRS_atlas/). It has also led to collaboration concerning this research with international colleagues.
Year(s) Of Engagement Activity 2018
URL http://www.ashg.org/2018meeting/pages/abstracts_late.shtml