Methods to improve genetic understanding of cardiometabolic traits through multiple traits and diverse population studies

Lead Research Organisation: University of Cambridge
Department Name: MRC Biostatistics Unit

Abstract

There has been great success in identifying hundreds of genetic variants associated with a large spectrum of diseases and traits, but very few of these variants have an understood role in how they impact the trait. Moreover, a detected variant does not necessarily contribute to effects in the trait, since it may instead have a high correlation with the variant that causes the effect. There is substantial interest in understanding the underlying biology of genetic variants that have an impact on disease or disease-relevant measurements (e.g. cholesterol levels), since there is evidence that this could lead to better disease treatment and prevention. I am particularly interested in improving our knowledge of cardiometabolic diseases due to their high impact on society, as well as globally. Cardiovascular disease (CVD) caused almost one third of deaths worldwide in 2013 and accounted for 45% of all deaths in European countries in 2016, while cardiometabolic disorders are expected to have a greater burden than infectious diseases (e.g. HIV/AIDS) in developing countries.

Recent technological advances have made it possible to obtain hundreds of measurements related to metabolism and there is evidence that understanding the genetic influences on human metabolism could improve our understanding of cardiometabolic diseases, as well as inform strategies for modifying existing drugs to treat additional diseases. However, the genetic analysis of many traits is often tackled by one-by-one analyses of individual traits without considering any correlations between them. Instead I will develop a method that identifies associations between many traits with many genetic variants. There is a broad applicability of this method to any large set of traits so there is high potential for impact on diseases and traits beyond those that I will analyse in this fellowship. I will also develop methods that combine information from multiple traits to create sets of genetic variants that will contain the true causal variants with a certain probability. Joint analyses of multiple traits have been shown to result in more refined sets of potential causal variants, but such methods do not yet exist when there are overlapping individuals between the studies, a common situation; this is a gap in methods that I intend to fill. These methods will be applied to several unique datasets, such as hundreds of metabolomics measurements and cardiometabolic, anthropometric and blood-related measurements from both European and African ancestry populations.

Gains in the probability to detect associations between genetic variants and traits, as well as the construction of finer resolution sets of potential causal variants, are often likely when information from different ancestries are considered together. However, most methods for jointly analysing diverse ancestries encounter difficulties in the balance between combining the information across the populations to detect associated variants and losing population-specific effects. Instead, I will develop an adaptive analysis approach that is expected to achieve this balance and will also jointly consider multiple traits. At the moment, no methods exist to construct sets of potential causal variants for multiple traits and multiple ethnicities; considering multiple traits is known to give improvements, as does multiple ethnicities, but the two have not yet been combined. This is another void in the methodological toolbox that I plan to fill.

All methods will be freely available on-line in user-friendly software and I will also produce an on-line reference database of relationships that are found between the many metabolomics measurements. These are expected to be of wide-spread use to a wide spectrum of researchers from methodological to disease-specific.

Technical Summary

Many trait-associated genetic variants have been identified, but the underlying biological mechanisms behind the genetic effects are unknown and the lists of potential causal variants behind these effects need refinement. The molecular underpinnings of cardiometabolic diseases may be better understood by examining how metabolism is affected by genetic variation via the association analysis of biochemical measures, including metabolites. To improve detection power and fine-mapping, I will develop multi-trait methods, as well as trans-ethnic multi-trait methods, for which methods currently do not exist.

To meta-analyse across diverse ancestries, I will construct an adaptable approach to trans-ethnic meta-analysis that balances between combining effects over cohorts without the loss of ethnicity-specific signals. This involves the use of a measure for the degree of genetic architecture overlap at each SNP to partition the SNPs according to heterogeneity and then adapting two approaches: a high-powered trans-ethnic meta-regression approach to detect associations in the presence of allelic heterogeneity due to ancestry and a powerful meta-analysis method in a Bayesian framework.

Few methods are applicable to hundreds of traits, as from metabolomics assays. For such data, I will develop an analysis approach that merges factor analysis and multiple regression. For multi-trait fine-mapping, I propose a Bayesian approach that takes advantage of the relatedness between traits by assigning a higher prior probability to joint models with shared variants between the traits. I will also adapt this approach across diverse ethnicities to capitalise on the differences in linkage disequilibrium between them. Further improvements will be sought by integrating external biological data.

I will assess method performance via simulation studies and apply the methods to unique datasets to which the outcomes will assist in understanding the aetiology of cardiometabolic diseases.

Planned Impact

The proposed research will contribute to improvements in the quality of life and health and the generation of innovative output. Applications to unique cardiometabolic datasets will increase knowledge of these traits and the diseases that they influence, such as cardiovascular disease. In the longer term, the results may be used in developing treatments for cardiometabolic diseases. For example, the efficacious statin drugs for lowering low-density lipoprotein (LDL) cholesterol target a gene (HMGCR) that contains variants associated with LDL cholesterol [Kathiresan et al. 2008. Nat Gen 40:189-97].

Metabolites act as intermediate phenotypes for diseases that are associated with disruptions in metabolic processes and could functionally link genetic variation to disease predisposing factors and then to complex disease, the clinical end-point. Examples of metabolic traits that are known risk factors for CVD include blood triglyceride, cholesterol and bilirubin levels [Suhre et al., 2012, Nat Rev 13: 759-769]. In addition, many variants associated with metabolites are also associated with response to drug treatment. Variants in SLCO1B1 are associated with risk of statin-induced myopathy and metabolomics GWAS revealed that these variants are also associated with a series of fatty acids. In turn, measurements of these fatty acids in biochemical assays may assist in the redesign of the appropriate drugs [Suhre et al., 2012]. This emphasises the importance of investigating all metabolites and not only known disease risk factors, as this may lead to the discovery of new biological processes or pathways that may be involved or disrupted in disease aetiology. The on-line reference database that I will construct has potential for high impact as a tool to explore metabolite relationships.

The factor analysis approach for the dimension reduction of many correlated traits may also be applied to many diseases to explore the underlying factors that link them. The proposed fine-mapping (FM) approaches are anticipated to lead to smaller sets of SNPs for follow-up, which will imply lower cost and less lab time in following up variants.

Besides trans-ethnic fine-mapping to identify potential causal variants in loci that are shared between ethnicities, identification of associations and fine-mapping within the individual populations is of interest. For instance, in African-Americans, as well as Nigerian Yorubans, APOL1 was identified as a risk locus for chronic kidney disease, which is an independent risk factor for CAD development [Adebamowo et al. 2017. Public Health Genomics. 20:9-16]. However, the association between APOL1 and CAD is not well-understood as one study has shown an association between APOL1 nephropathy variants and lower levels of subclinical atherosclerosis in diabetic African-Americans [Freedman et al. 2015], while another study showed the opposite direction of effect [Mukamal et al. 2016. Arter Thromb Vasc Biol. 36:398-403]. Metabolic traits may assist in better understanding the genetic effect of APOL1 on CAD and I will investigate this in the Ugandan cohort.

The corresponding software for all methods will be freely available on-line in a form that is easily accessible by the scientific community, regardless of statistical and/or computing expertise. The wide applicability of the methods to numerous diseases/traits enhances the potential for further research outcomes beyond the analyses that my research associate and I will carry out. Release of the source code will facilitate innovative output through further methodological developments.

Finally, my research associate will develop experience in the analysis of large-scale unique datasets and use of intensive computing. Experience and knowledge in metabolic and cardiovascular disease will also be expanded for my RA and myself.

Publications

10 25 50
 
Description Assessing the genetic impact of metabolic syndrome in Sub-Sahara Africa
Amount £19,970 (GBP)
Funding ID G101892 
Organisation Alborada Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2021 
End 12/2022
 
Description Environment-adjusted genetic analysis methods for cardiometabolic traits in African populations
Amount £500,913 (GBP)
Funding ID MR/W02098X/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 05/2022 
End 04/2025
 
Description A Personalised Medicine Approach to Diagnosis and Treatment In Inherited Cardiovascular Diseases Utilising -Omic Technologies 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Prof Perry Elliott's team, with others, had developed a tool for sudden cardiac death risk prediction which has been incorporated into European Guidelines. I am contributing by developing an alternative prediction tool that accounts for genetic data, and is in a Bayesian variable selection framework. We also have plans to re-purpose my recent multiple outcome fine-mapping methods for a prediction tool involving multiple inherited cardiovascular diseases.
Collaborator Contribution Prof Perry Elliott's team, and others, are collecting large data resources, preparing the data for analysis, and giving guidance on appropriate covariates to consider when building the model for the prediction tool.
Impact in early stages
Start Year 2020
 
Description Multi-trait fine-mapping for cardiometabolic traits in Uganda 
Organisation University of Cambridge
Department Department of Medicine
Country United Kingdom 
Sector Academic/University 
PI Contribution I developed flashfm (flexible and shared information fine-mapping), a new approach to multi-trait fine-mapping that only needs summary statistics and is efficient for more than two traits. My research associate, Nicolas Hernandez, has applied this to 33 cardiometabolic traits from a Uganda study (largest genetic asociation study from a single African population and led by Dr. Manj Sandhu).
Collaborator Contribution Manj Sandhu (now at Imperial College London) has shared data from the Uganda study with my team. Members of his team pre-processed and imputed the data. Interpretation of results was done together between us and our teams.
Impact The project is complete and is now published in Nature Communications. In fine-mapping signals from 33 cardiometabolic traits in a Ugandan cohort, flashfm improved resolution over that of single-trait fine-mapping; flashfm resulted in an average SNP group size reduction of 29% in 34% of the regions that had signals for at least two traits, compared to single-trait fine-mapping.The results from this analysis will contribute to a better understanding of cardiometabolic genetics in an African population. My research associate, Nicolas Hernandez, presented a poster on this work at the 2021 European Society of Human Genetics annual meeting.
Start Year 2019
 
Description Multi-trait fine-mapping for glycaemic traits in five ancestral populations 
Organisation University of Exeter
Department Medical School
Country United Kingdom 
Sector Academic/University 
PI Contribution I have developed a multi-trait fine-mapping method (flashfm) to identify likely causal variants in regions that have genetic associations with multiple quantitiave traits. I have given guidance on the application of this method and shared the software with Inês Barroso's team who have applied it to their glycaemic traits data in Exeter.
Collaborator Contribution Inês Barroso's team will apply my method to their data and we will interpret results together.
Impact Flashfm was applied to glycaemic traits as a summer project to a MSc student, Jana Soenksen, at Exeter. She has written her report, with some guidance from me, and is now a PhD student there. Her first chapter of her dissertation will be based on this work, following up the findings in the lab, and we will also submit a paper on these findings. Jana has submitted an abstract on this work to the American Diabetes Association annual meeting.
Start Year 2020
 
Description Multi-trait fine-mapping in 125,000 individuals of African ancestry 
Organisation MRC/UVRI Uganda Research Unit on AIDS
Country Uganda 
Sector Public 
PI Contribution I gave a tutorial on my recently developed multi-trait fine-mapping method, flashfm, to the research group of Segun Fatumo and Tinashe Chikowore. My research group and I have also given guidance to three of his post-doctoral fellows on its application and interpretation of its results, as well as optimal visualisations of the results. One of my group members also ran analyses and summarised results.
Collaborator Contribution Their research group members have acquired and processed the African ancestry data, run meta-analyses and multi-trait analysis approaches. They have also given insights on interpretation of results.
Impact We have now completed this project and have submitted the manuscript, entitled "Discovery and fine-mapping in lipids multi-trait genome-wide study of 125,000 individuals of African ancestry", for peer review. A preprint has been requested, but has not yet appeared on-line.
Start Year 2020
 
Description Multi-trait fine-mapping in a large UK study of cardiometabolic traits 
Organisation University of Cambridge
Department Cardiovascular Epidemiology Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution I have developed a multi-trait fine-mapping method (flashfm) and my research associate is applying it to to identify likely causal variants in regions that have genetic associations with cariometabolic traits from ~50,000 UK blood donors, as shared by Dr. Adam Butterworth and Prof. John Danesh.
Collaborator Contribution Dr. Adam Butterworth and Prof. John Danesh have shared this data and we will interpret results together.
Impact This is in progress.
Start Year 2019
 
Title MFM: Multinomial Fine-Mapping 
Description MFM is an R package for simultaneous fine-mapping of genetic associations for several diseases, in a Bayesian framework that borrows information between the diseases. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact In simulation studies MFM was found to have greater accuracy than single disease analyses, when there are shared causal variants, and negligible loss of precision otherwise. MFM was applied to data from six autoimmune diseases (type 1 diabetes, multiple sclerosis, autoimmune thyroid disease, rheumatoid arthritis, juvenile idiopathic arthritis, celiac disease) and revealed causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes. 
URL https://www.biorxiv.org/content/10.1101/553560v1
 
Title MFMextra: Specific analyses for MFM package 
Description This R package provides the simulation functions used to assess the joint fine-mapping methods of MFM. In particular, it provides the tools to generate phenotype and genotype data for two diseases with shared controls. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact It was used to evaluate MFM and show that MFM has more accuracy than single disease analyses. It has potential for an impact on method development as it could be used to assess other methods for the anlaysis of disease with shared controls. 
URL https://www.biorxiv.org/content/10.1101/553560v1
 
Title flashfm-ivis - interactive visualisation of fine-mapping results 
Description flashfm-ivis provides a suite of interactive visualisation plots to view potential causal genetic variants that underlie associations that are shared or distinct between multiple quantitative traits and compares results between single- and multi-trait fine-mapping. Unique features include network diagrams that show joint effects between variants for each trait and regional association plots that inte-grate fine-mapping results, all with user-controlled zoom features for an interactive exploration of potential causal variants across traits. An attractive feature of this tool is that its web implementation does not require any programming experience - users just upload their files and everything is point and click. It is also available as a downloadable R package for those who prefer to use it directly on their machine. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact too early for impact to be known 
URL http://shiny.mrc-bsu.cam.ac.uk/apps/flashfm-ivis/
 
Title flashfm: flexible and shared information fine-mapping 
Description Flashfm (flexible and shared information fine-mapping) is a package to simultaneously fine-map genetic associations (select potential causal variants) for multiple quantitative traits that are measured in the same study(s) by sharing information between the traits. It is flexible to the inclusion of related individuals in the sample and to missing trait measurements. There are no other existing methods that are able to fine-map assocaitions from multiple quantitative traits, allowing for different causal variants between the triats. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact In the main paper introducing this method, to be submitted soon, flashfm is applied to 33 cardiometabolic traits from a Ugandan cohort. Simulations demonstrate that flashfm gives higher resolution than single-trait fine-mapping, and we also find that this is true in our real data analysis. In a region that is harbours genes APOE and TOMM40, there are four lipid traits with association signals and the use of flashfm reduced two of the potential sets of causal variants by 50% and 25% of that obtained from single-trait fine-mapping. The impact of this is fewer SNPs that would need to be followed up as potential causal variants for cardiometabolic traits. Flashfm has also been applied to glycaemic traits (MAGIC study) by researchers in Exeter (details in collaboration section) and to cardiometabolic traits in several African cohorts by researchers from Uganda (details in collaboration section). All of these applications will lead to further contributions in the understanding of cardiometabolic disease in both European and African ancestry populations. 
URL https://jennasimit.github.io/flashfm/
 
Title flashfmZoom - joint fine-mapping and exploration of GWAS results in the UK Biobank 
Description flashfmZoom is an all-in-one tool for analysis and interactive visualisation of potential causal genetic variants that underlie associations with quantitative traits from the UK Biobank. It offers a user-friendly interface and guides users in the selection of pleiotropic regions among subsets of 134 quantitative traits, such as cardiometabolic, hematologic, and respiratory traits. Users may then run single-trait fine-mapping, allowing for multiple causal variants, and leverage information between the traits using multi-trait fine-mapping to improve resolution. A series of interactive plots and downloadable tables are generated within flashfmZoom to identify potential causal variants that are shared or distinct between the traits; it also lists relevant literature for the traits and/or variants. 
Type Of Technology Webtool/Application 
Year Produced 2023 
Open Source License? Yes  
Impact Too early to say, as we are about to submit the preprint and manuscript for publication. However, we have already identified some interesting combinations of traits and results. This emphasises that besides exploring traits that are well-known to be related, flashfmZoom encourages interactive exploration for the joint analysis of traits that may not often be considered together. This may reveal common aetiological pathways between traits related to different disorders. 
URL https://github.com/fz-cambridge/flashfmZoom
 
Description 50th European Mathematical Genetics Meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Jennifer Asimit led the scientific organising committee for the 50th European Mathematical Genetics Meeting hosted by the BSU in April 2022, with 180 delegates from across academia and industry.
Year(s) Of Engagement Activity 2022
 
Description A day in the life blog post - Nicolas Hernandez 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact 'A day in the life' blog post for the BSU website written by Research Associate, Nicolas Hernandez, giving audiences a glimpse into his role at the BSU.
Year(s) Of Engagement Activity 2021
URL https://www.mrc-bsu.cam.ac.uk/blog/a-day-in-the-life-of-a-statistician-nicolas-hernandez/
 
Description Blog for general audience on flashfm publication 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact After publication of a new method (flashfm) in Nature Communications, I wrote a general (non-scientific) summary on it for a blog on the MRC Biostatistics Unit's News page - it was also featured in the University of Cambridge's School of Clinical Medicine newsletter. To reach a larger audience, I circulated the blog on twitter (re-tweeted from the MRC Biostatistics Unit) and see that my tweet generated1600 impressions and 66 engagements (this does not include engagements from the tweet sent by the Unit).

I have received a few emails asking questions on the use of the software for this method, so i am aware of others using the method, who are outside of my network.
Year(s) Of Engagement Activity 2021
URL https://www.mrc-bsu.cam.ac.uk/blog/new-approach-to-pinpointing-genetic-variants-behind-diseases/
 
Description Blog for general audience on flashfm-ivis publication 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact After publication of a new web tool (flashfm-ivis) in Bioinformatics, I wrote a general (non-scientific) summary on it for a blog on the MRC Biostatistics Unit's News page. To reach a larger audience, I circulated the blog on twitter (re-tweeted from the MRC Biostatistics Unit). The tweet of the publication generated 6319 impressions and 289 engagements.
I have received a few emails asking questions on the use of the associated flashfm software, as well as this interactive visualisation tool flashfm-ivis, so i am aware of others using the software, who are outside of my network.
Year(s) Of Engagement Activity 2022
URL https://www.mrc-bsu.cam.ac.uk/blog/plot-and-play-a-new-tool-to-explore-statistical-evidence-for-gene...
 
Description Cambridge Science Fesitval 2019 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Participation in Cambridge Science Festival 2019 - presenting bespoke hands-on activity based on statistical research taking place at the BSU.
Year(s) Of Engagement Activity 2019
 
Description Invited Mentoring of Trainees (IGES conference) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I was invited as a mentor to co-host a 1-hour session on Research and Grants during the annual International Genetic Epidemiology Society (IGES) meeting - this was via Zoom and was aimed at trainees (postgraduate and early postdoctoral); the session title was "Mentoring Session #1". We had five attendees. I gave a brief description of my research and gave advice on how to prepare grant applications for fellowships. This was a very interactive session and at the end the attendees said that they felt better prepared to write their grants.
Year(s) Of Engagement Activity 2021
URL https://pheedloop.com/IGES2021/site/schedule/