Methodology for the identification of shared genetic aetiology between epidemiologically linked disorders

Lead Research Organisation: Wellcome Sanger Institute
Department Name: Human Genetics

Abstract

Many common health disorders are frequently observed to co-occur in individuals. Hundreds of genetic disease association analyses have been completed, resulting in the identification of numerous genetic variants that are associated with a specific disease, such as type 2 diabetes, cancer, and osteoarthritis. The primary aim of the proposed research is to develop methodology to detect genetic variants that have an influence on susceptibility to two disorders, which are suspected of having shared genetic causes. The majority of existing approaches have focused on identifying genetic casual elements for a single trait, and only few of them jointly analyse linked disorders.
There are many pairs of health disorders that have been identified as either frequently existing together, or having an inverse relationship, where the presence of one disorder tends to reduce the risk of another disease. Disease co-occurrence parings include type 2 diabetes with Crohn's disease, cancer, and psychiatric disorders, as well as osteoarthritis with body mass index (BMI) and height. It has been established that there is an inverse relationship between prostate cancer and type 2 diabetes. Moreover, a treatment for prostate cancer was found to increase the risk of diabetes and cardiovascular disease. This may be due to the roles of particular shared genetic variants. Thus, the development of non-adverse treatments for either of the two diseases may be assisted by the identification of gentic variants with such inverse effects on two diseases. This highlights, among others, the importance of identifying common genetic causes between linked diseases.
Shared genetic causes for most of these disease pairings have been identified using simple separate analyses of each disease. One method is to compare the individual results from each analysis, and to choose common criteria for the identification of genetic disease associations. The overlap of the two sets of results is then examined. A caveat of this approach is that each analysis has a different level of how well it can detect associations, so that although an association may exist with both traits, it may only be detectable within one of the studies, and thus not found in the overlap analysis. Also, as with many other approaches, it does not take advantage of any known genetic information. Alternatively, associations with one disease may be searched for within the genes that have been recognized as associated with the other disease. However, this greatly reduces the search area.
In developing methods to jointly analyse traits, the level of how well associations could be detected may be increased by taking into account known pieces of genetic and biological information, such as previously identified genetic variants, and the biological functions ascribed to them.
Statistical analysis methods will be developed and tested for performance by generating various plausible datasets under an assortment of assumptions. The method with the best performance will then be applied to real datasets, such as type 2 diabetes with schizophrenia, waist-hip ratio with BMI, osteoarthritis with BMI, and osteoarthritis with migraine.

Technical Summary

There is an abundance of analysis methods for detecting genetic associations with a single complex trait, but sparseness in powerful and robust methods to jointly search for shared genetic associations between two disorders that are epidemiologically linked. This motivates the main aim of developing such joint analysis methods. Data for a given trait may either be raw sequencing or genotyping data or in the form of summary statistics from a genetic association study. Therefore, in analysing the genetic data from two traits together, the three possible combinations of data that may be available need to be considered, and each combination requires a different analysis approach to be developed. Besides the immediate data at hand, there is relevant external information, such as previously detected associations with either trait, as well as functional information. This external information is only occasionally accounted for in single trait analyses, and so far, has not been considered for joint trait analyses. Incorporation of prior genetic and biological information adds another degree of complexity to the methodology development, but should increase the power to detect shared associations with disease. Approaches to tackle the problem include linear models, use of Bayes factors, Bayesian decision theory, and chance-corrected measures of agreement in a regression framework. Extensive simulation studies will be carried out to determine the most powerful analysis approach, which will then be applied to real datasets that include osteoarthritis, metabolic and psychiatric disorders.

Planned Impact

The research that I will carry out will benefit the pharmacogenetics field, which has direct translational implications, as well as those who are afflicted with diseases that have a tendency to be linked with additional disorders. Several disorders that this applies to include cancer, osteoarthritis, type 2 diabetes (T2D), Crohn's disease, psychiatric disorders, cardiovascular disease, and obesity.
The methods to be developed will enable the identification of variants that are associated with multiple epidemiologically linked traits. Identified shared variants may either have the same effect on susceptibility for two disorders, or an inverse relationship, as in the well-documented case of prostate cancer and T2D. It has already been established that those afflicted with prostate cancer tend to have a lower susceptibility to T2D and vice versa. Moreover, research on a particular therapy for prostate cancer revealed that men receiving the treatment had an increased risk of developing diabetes and cardiovascular disease, and thus contributing to non-cancer morbidity (Keating et al. 2006, as in Case for Support). The therapy releases a hormone and may prolong life for men with locally advanced prostate cancer, but it may also increase fat mass and decrease insulin sensitivity. An inverse scenario would be that a treatment targeting a T2D locus might increase risk for the development of prostate cancer. Gene expression influences various functions and it may be that the expression of a particular gene is increased in individuals with prostate cancer, and decreased in those with T2D, or vice versa. This illustrates that identifying genes and/or variants that have reverse effects on the two diseases may assist in non-adverse treatment development for either of the two diseases.
Identification of shared genetic components that have the same direction of effect on disease susceptibility will contribute to the understanding of the risk of developing multiple diseases, as well as the likelihood of an additional disease emerging, given that a condition is already present. Furthermore, knowledge of the common genetic etiology could aid in constructing a prevention plan for the development of multiple conditions. Such information may also assist in tailoring existing treatments for certain conditions to additionally control the effects of a linked disorder.
Within the next three years, I will complete my planned methodology, including related software, and in the final year I will analyse several real datasets. Therefore, within three years, methods to search for shared genetic components in comorbid conditions will be available for use by others. An immediate consequence of this will be an enhanced knowledge of disease(s) susceptibility, benefiting society in general. An additional outcome that will eventually follow is the assembly of a prevention plan for multiple disorders. Finally, this will assist pharmacogenticists in choosing a direction that may be followed to develop safe and effective treatments, which will subsequently benefit numerous individuals afflicted with disease(s).
It is hoped that in my own analyses of epidemiologically linked traits, such as T2D and schizophrenia, some shared genetic components will be identified for further investigation. Therefore, in addition to methodological contributions, I may have an applied contribution to treatment development, prevention plans, and susceptibility awareness.

Publications

10 25 50
 
Description Enrichment analysis of glycemic traits 
Organisation MAGIC (The Meta-Analyses of Glucose and Insulin-related traits Consortium)
Country Global 
Sector Charity/Non Profit 
PI Contribution As a result of my enrichment analysis of glycemic traits using the publicly available MAGIC genome-wide association study data, my next step was to apply the method to exome data from MAGIC, which is not publicly available. The results of this analysis contribute to the consortia paper ("Tissue Specific Alteration of Metabolic Pathways Influences Glycemic Regulation"), which has been submitted January 2019.
Collaborator Contribution The MAGIC investigators gave me access to their exome data and trans-ethnic data, which are not yet publicly released.
Impact The results contribute biological insight to variants that are associated with glycemic traits, and will in turn, assist in understanding metabolic disorders, such as type 2 diabetes.
Start Year 2016
 
Description Overlap analysis of obesity and osteoarthritis 
Organisation Broad Institute
Department The Genetic Investigation of ANthropometric Traits (GIANT)
Country United States 
Sector Academic/University 
PI Contribution I have applied the methods that I have developed for an overlap analysis of two traits to obesity (GIANT) and osteoarthritis (arcOGEN). A shortlist of variants associated with both traits has been generated and are being followed up in a replication study.
Collaborator Contribution The GIANT consortium has shared their obesity meta-analysis data, and had also re-run the meta-analysis with one cohort removed. This adjustment was required for comparisons in the overlap analysis. The arcoGEN consortium has shared their osteoarthritis GWAS data.
Impact A manuscript on the methods and an overlap analysis of obesity and osteoarthritis has been published in the journal Genetic Epidemiology. A replication study based on a shortlist of variants from the overlap analysis is in progress by members of arcOGEN. Software for the implementation of the developed methods (BOAT) is freely available.
Start Year 2013
 
Description Overlap analysis of obesity and osteoarthritis 
Organisation The Wellcome Trust Sanger Institute
Department Arthritis Research UK Osteoarthritis Genetics (arcoGEN)
Country United Kingdom 
Sector Academic/University 
PI Contribution I have applied the methods that I have developed for an overlap analysis of two traits to obesity (GIANT) and osteoarthritis (arcOGEN). A shortlist of variants associated with both traits has been generated and are being followed up in a replication study.
Collaborator Contribution The GIANT consortium has shared their obesity meta-analysis data, and had also re-run the meta-analysis with one cohort removed. This adjustment was required for comparisons in the overlap analysis. The arcoGEN consortium has shared their osteoarthritis GWAS data.
Impact A manuscript on the methods and an overlap analysis of obesity and osteoarthritis has been published in the journal Genetic Epidemiology. A replication study based on a shortlist of variants from the overlap analysis is in progress by members of arcOGEN. Software for the implementation of the developed methods (BOAT) is freely available.
Start Year 2013
 
Title BOAT: Bayesian Overlap Analysis Tool 
Description BOAT (Bayesian Overlap Analysis Tool) identifies variants that are associated with two disease and tests for enrichment (i.e. whether there are more shared associated variants than expected by chance). The only input required consists of genome-wide association study summary statistics from each disease. The software may then be used to identify overlap variants between the two diseases by comparing approximate Bayes' factors, as well as by comparing p-values. This software is based on the method described in the paper "A Bayesian Approach to the Overlap Analysis of Epidemiologically Linked Traits", published in Genetic Epidemiology. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact This software has been applied to the overlap analysis of obesity (from GIANT) and osteoarthritis (from arcOGEN), which has produced a shortlist of variants for follow-up (in process by members of arcOGEN). 
URL http://www.sanger.ac.uk/science/tools/boat
 
Title COMET: Corrected Overlap and Marginal Enrichment Test 
Description The software assists researchers in gaining further biological insight on characteristics of genetic variants that are associated with multiple diseases/traits. It requires summary-level genetic association data, rather than individual-level data, which allows the re-use of previously-analysed data that is often widely shared. This software is based on the method described in the paper "A two-stage inter-rater approach for enrichment testing of variants associated with multiple traits", published in the European Journal of Human Genetics. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact I had presented this method at various conferences/meetings, resulting in requests to share the software. In addition, in collaboration with the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC), I have applied the method to exome data and trans-ethnic data, revealing interesting characteristics of variants associated with glycaemic traits. 
URL http://www.sanger.ac.uk/science/tools/comet