Methods to improve genetic understanding of cardiometabolic traits through multiple traits and diverse population studies

Lead Research Organisation: University of Cambridge
Department Name: MRC Biostatistics Unit

Abstract

There has been great success in identifying hundreds of genetic variants associated with a large spectrum of diseases and traits, but very few of these variants have an understood role in how they impact the trait. Moreover, a detected variant does not necessarily contribute to effects in the trait, since it may instead have a high correlation with the variant that causes the effect. There is substantial interest in understanding the underlying biology of genetic variants that have an impact on disease or disease-relevant measurements (e.g. cholesterol levels), since there is evidence that this could lead to better disease treatment and prevention. I am particularly interested in improving our knowledge of cardiometabolic diseases due to their high impact on society, as well as globally. Cardiovascular disease (CVD) caused almost one third of deaths worldwide in 2013 and accounted for 45% of all deaths in European countries in 2016, while cardiometabolic disorders are expected to have a greater burden than infectious diseases (e.g. HIV/AIDS) in developing countries.

Recent technological advances have made it possible to obtain hundreds of measurements related to metabolism and there is evidence that understanding the genetic influences on human metabolism could improve our understanding of cardiometabolic diseases, as well as inform strategies for modifying existing drugs to treat additional diseases. However, the genetic analysis of many traits is often tackled by one-by-one analyses of individual traits without considering any correlations between them. Instead I will develop a method that identifies associations between many traits with many genetic variants. There is a broad applicability of this method to any large set of traits so there is high potential for impact on diseases and traits beyond those that I will analyse in this fellowship. I will also develop methods that combine information from multiple traits to create sets of genetic variants that will contain the true causal variants with a certain probability. Joint analyses of multiple traits have been shown to result in more refined sets of potential causal variants, but such methods do not yet exist when there are overlapping individuals between the studies, a common situation; this is a gap in methods that I intend to fill. These methods will be applied to several unique datasets, such as hundreds of metabolomics measurements and cardiometabolic, anthropometric and blood-related measurements from both European and African ancestry populations.

Gains in the probability to detect associations between genetic variants and traits, as well as the construction of finer resolution sets of potential causal variants, are often likely when information from different ancestries are considered together. However, most methods for jointly analysing diverse ancestries encounter difficulties in the balance between combining the information across the populations to detect associated variants and losing population-specific effects. Instead, I will develop an adaptive analysis approach that is expected to achieve this balance and will also jointly consider multiple traits. At the moment, no methods exist to construct sets of potential causal variants for multiple traits and multiple ethnicities; considering multiple traits is known to give improvements, as does multiple ethnicities, but the two have not yet been combined. This is another void in the methodological toolbox that I plan to fill.

All methods will be freely available on-line in user-friendly software and I will also produce an on-line reference database of relationships that are found between the many metabolomics measurements. These are expected to be of wide-spread use to a wide spectrum of researchers from methodological to disease-specific.

Technical Summary

Many trait-associated genetic variants have been identified, but the underlying biological mechanisms behind the genetic effects are unknown and the lists of potential causal variants behind these effects need refinement. The molecular underpinnings of cardiometabolic diseases may be better understood by examining how metabolism is affected by genetic variation via the association analysis of biochemical measures, including metabolites. To improve detection power and fine-mapping, I will develop multi-trait methods, as well as trans-ethnic multi-trait methods, for which methods currently do not exist.

To meta-analyse across diverse ancestries, I will construct an adaptable approach to trans-ethnic meta-analysis that balances between combining effects over cohorts without the loss of ethnicity-specific signals. This involves the use of a measure for the degree of genetic architecture overlap at each SNP to partition the SNPs according to heterogeneity and then adapting two approaches: a high-powered trans-ethnic meta-regression approach to detect associations in the presence of allelic heterogeneity due to ancestry and a powerful meta-analysis method in a Bayesian framework.

Few methods are applicable to hundreds of traits, as from metabolomics assays. For such data, I will develop an analysis approach that merges factor analysis and multiple regression. For multi-trait fine-mapping, I propose a Bayesian approach that takes advantage of the relatedness between traits by assigning a higher prior probability to joint models with shared variants between the traits. I will also adapt this approach across diverse ethnicities to capitalise on the differences in linkage disequilibrium between them. Further improvements will be sought by integrating external biological data.

I will assess method performance via simulation studies and apply the methods to unique datasets to which the outcomes will assist in understanding the aetiology of cardiometabolic diseases.

Planned Impact

The proposed research will contribute to improvements in the quality of life and health and the generation of innovative output. Applications to unique cardiometabolic datasets will increase knowledge of these traits and the diseases that they influence, such as cardiovascular disease. In the longer term, the results may be used in developing treatments for cardiometabolic diseases. For example, the efficacious statin drugs for lowering low-density lipoprotein (LDL) cholesterol target a gene (HMGCR) that contains variants associated with LDL cholesterol [Kathiresan et al. 2008. Nat Gen 40:189-97].

Metabolites act as intermediate phenotypes for diseases that are associated with disruptions in metabolic processes and could functionally link genetic variation to disease predisposing factors and then to complex disease, the clinical end-point. Examples of metabolic traits that are known risk factors for CVD include blood triglyceride, cholesterol and bilirubin levels [Suhre et al., 2012, Nat Rev 13: 759-769]. In addition, many variants associated with metabolites are also associated with response to drug treatment. Variants in SLCO1B1 are associated with risk of statin-induced myopathy and metabolomics GWAS revealed that these variants are also associated with a series of fatty acids. In turn, measurements of these fatty acids in biochemical assays may assist in the redesign of the appropriate drugs [Suhre et al., 2012]. This emphasises the importance of investigating all metabolites and not only known disease risk factors, as this may lead to the discovery of new biological processes or pathways that may be involved or disrupted in disease aetiology. The on-line reference database that I will construct has potential for high impact as a tool to explore metabolite relationships.

The factor analysis approach for the dimension reduction of many correlated traits may also be applied to many diseases to explore the underlying factors that link them. The proposed fine-mapping (FM) approaches are anticipated to lead to smaller sets of SNPs for follow-up, which will imply lower cost and less lab time in following up variants.

Besides trans-ethnic fine-mapping to identify potential causal variants in loci that are shared between ethnicities, identification of associations and fine-mapping within the individual populations is of interest. For instance, in African-Americans, as well as Nigerian Yorubans, APOL1 was identified as a risk locus for chronic kidney disease, which is an independent risk factor for CAD development [Adebamowo et al. 2017. Public Health Genomics. 20:9-16]. However, the association between APOL1 and CAD is not well-understood as one study has shown an association between APOL1 nephropathy variants and lower levels of subclinical atherosclerosis in diabetic African-Americans [Freedman et al. 2015], while another study showed the opposite direction of effect [Mukamal et al. 2016. Arter Thromb Vasc Biol. 36:398-403]. Metabolic traits may assist in better understanding the genetic effect of APOL1 on CAD and I will investigate this in the Ugandan cohort.

The corresponding software for all methods will be freely available on-line in a form that is easily accessible by the scientific community, regardless of statistical and/or computing expertise. The wide applicability of the methods to numerous diseases/traits enhances the potential for further research outcomes beyond the analyses that my research associate and I will carry out. Release of the source code will facilitate innovative output through further methodological developments.

Finally, my research associate will develop experience in the analysis of large-scale unique datasets and use of intensive computing. Experience and knowledge in metabolic and cardiovascular disease will also be expanded for my RA and myself.

Publications

10 25 50
 
Title MFM: Multinomial Fine-Mapping 
Description MFM is an R package for simultaneous fine-mapping of genetic associations for several diseases, in a Bayesian framework that borrows information between the diseases. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact In simulation studies MFM was found to have greater accuracy than single disease analyses, when there are shared causal variants, and negligible loss of precision otherwise. MFM was applied to data from six autoimmune diseases (type 1 diabetes, multiple sclerosis, autoimmune thyroid disease, rheumatoid arthritis, juvenile idiopathic arthritis, celiac disease) and revealed causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes. 
URL https://www.biorxiv.org/content/10.1101/553560v1
 
Title MFMextra: Specific analyses for MFM package 
Description This R package provides the simulation functions used to assess the joint fine-mapping methods of MFM. In particular, it provides the tools to generate phenotype and genotype data for two diseases with shared controls. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact It was used to evaluate MFM and show that MFM has more accuracy than single disease analyses. It has potential for an impact on method development as it could be used to assess other methods for the anlaysis of disease with shared controls. 
URL https://www.biorxiv.org/content/10.1101/553560v1
 
Description Cambridge Science Fesitval 2019 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Participation in Cambridge Science Festival 2019 - presenting bespoke hands-on activity based on statistical research taking place at the BSU.
Year(s) Of Engagement Activity 2019