Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions (GEMINI)
Lead Research Organisation:
UNIVERSITY OF EXETER
Department Name: Institute of Biomed & Clinical Science
Abstract
More than 50% of people over the age of 65 are living with more than one long term condition (multimorbidity). Despite this, people with multimorbidity are often excluded from clinical trials and there has been limited research into identifying the causes of multimorbidity. For example, we often do not know if two common long-term conditions occur together by chance as we get older, whether one leads to the other, or if they share a risk factor. This problem is partly because health care professionals and researchers tend of focus on one condition at a time. For example, there has been a lot of research into the causes and consequences of osteoarthritis but not why people with osteoarthritis have a higher frequency of asthma, even when accounting for sex, age and obesity.
The aim of our research is to uncover new links between long term conditions that could lead to improved interventions including drug treatments or other more focused treatments. These new links could include a better understanding of which cells in the body are most critical to the presence of two conditions in the same patient.
To achieve our aims, we have formed a partnership called the GEMINI (Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions) collaborative. This team includes two people with multimorbidity, health care professionals including those in primary care and experts in statistics and genetics. In GEMINI we will study the causes of multimorbidity with a new approach. We will use existing databases of DNA sequence information linked to diseases from 10,000s of people. Using this genetic approach our initial research has identified many new and interesting links between conditions that were not previously well known. For example, between Rheumatoid arthritis and stroke (but not Rheumatoid arthritis and heart disease), gastro-reflux disease and depression, and between asthma and osteoarthritis. We will complement the genetic approach with data from millions of patients in primary care. These patients are representative of the UK as a whole and will allow us to study large numbers of people with combinations of conditions even if these combinations are quite rare.
Our research plans are divided into three parts. We will involve patients and carers in all stages to ensure we are using their data appropriately and to help us remain focused on the important conditions and outcomes of multimorbidity. First, we will use three sources of data from patients in primary care (GPs) to define the conditions we will study. We will start from all conditions that are long term and present in more than 1% of the people over 65 years. We will then use millions of DNA sequence changes - the genetic information we inherit from our parents - to identify which conditions share broad biological mechanisms. Second, we will use a similar number of genetic variants to identify the specific mechanisms involved. These techniques are based on the principle that inherited DNA sequence changes are fixed for life and so provide us with a way of assessing the causal direction of associated risk factors and diseases. For example, we will use genetics to test whether one disease leads to a second disease, or whether a shared risk factor leads to both. These risk factors will include well known risks such as obesity and more detailed measures of biology, such as how genes are switched on and off in different cells and tissues. Third, we will study in more depth patients with the conditions highlighted in the first two steps using primary care databases. We will hold workshops with patients and carers to understand in depth the most important outcomes of these conditions, for example is reduced lifespan more or less important than risk of frequent hospitalisation? We will then study patients with new combinations of conditions to see if they suffer from worse outcomes.
The aim of our research is to uncover new links between long term conditions that could lead to improved interventions including drug treatments or other more focused treatments. These new links could include a better understanding of which cells in the body are most critical to the presence of two conditions in the same patient.
To achieve our aims, we have formed a partnership called the GEMINI (Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions) collaborative. This team includes two people with multimorbidity, health care professionals including those in primary care and experts in statistics and genetics. In GEMINI we will study the causes of multimorbidity with a new approach. We will use existing databases of DNA sequence information linked to diseases from 10,000s of people. Using this genetic approach our initial research has identified many new and interesting links between conditions that were not previously well known. For example, between Rheumatoid arthritis and stroke (but not Rheumatoid arthritis and heart disease), gastro-reflux disease and depression, and between asthma and osteoarthritis. We will complement the genetic approach with data from millions of patients in primary care. These patients are representative of the UK as a whole and will allow us to study large numbers of people with combinations of conditions even if these combinations are quite rare.
Our research plans are divided into three parts. We will involve patients and carers in all stages to ensure we are using their data appropriately and to help us remain focused on the important conditions and outcomes of multimorbidity. First, we will use three sources of data from patients in primary care (GPs) to define the conditions we will study. We will start from all conditions that are long term and present in more than 1% of the people over 65 years. We will then use millions of DNA sequence changes - the genetic information we inherit from our parents - to identify which conditions share broad biological mechanisms. Second, we will use a similar number of genetic variants to identify the specific mechanisms involved. These techniques are based on the principle that inherited DNA sequence changes are fixed for life and so provide us with a way of assessing the causal direction of associated risk factors and diseases. For example, we will use genetics to test whether one disease leads to a second disease, or whether a shared risk factor leads to both. These risk factors will include well known risks such as obesity and more detailed measures of biology, such as how genes are switched on and off in different cells and tissues. Third, we will study in more depth patients with the conditions highlighted in the first two steps using primary care databases. We will hold workshops with patients and carers to understand in depth the most important outcomes of these conditions, for example is reduced lifespan more or less important than risk of frequent hospitalisation? We will then study patients with new combinations of conditions to see if they suffer from worse outcomes.
Technical Summary
We will test the hypothesis that there are unrecognised combinations of long term conditions (LTCs) which arise due to shared biological pathways, and that these can be discovered using human genetic methods. Recent studies indicate that this is an exciting and feasible approach to advance the understanding of multimorbidity, with human genetic and genomic data leading to new discoveries about shared mechanisms between LTCs.
Our research is divided into three complementary work packages (WPs). In WP1, we will select LTCs that are common and use genetics to identify those that are likely to share biological pathways. We will use data from three primary care databases consisting of millions of patients to identify LTCs present in >1% of people aged >65 yrs. We will then use data from large (10,000s of cases) genetic studies to identify LTCs that are genetically correlated with each other.
In WP2, we will identify some of the specific mechanisms underlying the LTC combinations identified in WP1. We will test if known modifiable risk factors, such as BMI, account for some of the genetic correlations. We will then use more specific sets of genetic variants to identify potential causes. These genetic variants will include those representing genes expressed in specific cell types, those representing potential drug targets, and those that can be used in Mendelian randomization tests of specific risk factors. We will develop new methods to ensure our causal inferences are robust, including the use of genetic variants to test whether one condition causes a second condition.
In WP3, we will obtain patients' perspectives on the new LTC combinations. We will then use the primary care datasets to test the hypothesis that the new LTC combinations represent distinct clinical entities, defined as patients aged >40 yrs with >1 condition having worse outcomes than expected. Our results will inform future translational studies aiming to reduce the burden of multimorbidity
Our research is divided into three complementary work packages (WPs). In WP1, we will select LTCs that are common and use genetics to identify those that are likely to share biological pathways. We will use data from three primary care databases consisting of millions of patients to identify LTCs present in >1% of people aged >65 yrs. We will then use data from large (10,000s of cases) genetic studies to identify LTCs that are genetically correlated with each other.
In WP2, we will identify some of the specific mechanisms underlying the LTC combinations identified in WP1. We will test if known modifiable risk factors, such as BMI, account for some of the genetic correlations. We will then use more specific sets of genetic variants to identify potential causes. These genetic variants will include those representing genes expressed in specific cell types, those representing potential drug targets, and those that can be used in Mendelian randomization tests of specific risk factors. We will develop new methods to ensure our causal inferences are robust, including the use of genetic variants to test whether one condition causes a second condition.
In WP3, we will obtain patients' perspectives on the new LTC combinations. We will then use the primary care datasets to test the hypothesis that the new LTC combinations represent distinct clinical entities, defined as patients aged >40 yrs with >1 condition having worse outcomes than expected. Our results will inform future translational studies aiming to reduce the burden of multimorbidity
Publications
Hawkes G
(2023)
Genetic evidence that high BMI in childhood has a protective effect on intermediate diabetes traits, including measures of insulin sensitivity and secretion.
in medRxiv : the preprint server for health sciences
Masoli JAH
(2022)
Genomics and multimorbidity.
in Age and ageing
Mounier Ninon
(2024)
Using genetics to explore the role of BMI as a shared risk factor in multimorbidity
in EUROPEAN JOURNAL OF HUMAN GENETICS
Murrin O
(2025)
A systematic analysis of the contribution of genetics to multimorbidity and comparisons with primary care data.
in EBioMedicine
| Title | GEMINI genome-wide associations study (GWAS) summary statistics v1 |
| Description | GEMINI: Genetic Evaluation of Multimorbidity towards INdividualisation of Interventions GWAS summary statistics for 72 long-term conditions. Up to three sources of genetics data are used, depending on the condition: UK Biobank, FinnGen, and consortium-published meta-analyses (where available). If you use these resources please cite the below and include the resource release version: Murrin et al. (2024) A systematic analysis of the contribution of genetics to multimorbidity and comparisons with primary care data. eBioMedicine. https://doi.org/10.1016/j.ebiom.2025.105584 See our GitHub repos for more information: https://github.com/GEMINI-multimorbidity Summary information See the `conditions.txt` file for a list of conditions included, plus file suffix, studies included, and effective sample size. GWAS files are provided in GWAS catalog format, with positions mapped to build 37. For each GWAS file there is a README, detailing the source of the summary statistics: 1) UK Biobank [UKB], a large population-based prospective study with 450,197 individuals of European genetic ancestry. 2) FinnGen, a large-scale genomics initiative including over 500,000 participants with linked health diagnosis data. 3) Disease-specific GWAS meta-analyses summary statistics when available for each LTC. See the GEMINI GitHub for details on LTC diagnostic codes: https://github.com/GEMINI-multimorbidity An extract of the methods from Murrin 2025 (https://doi.org/10.1016/j.ebiom.2025.105584) are included below: UK Biobank dataTo perform the genetic analyses we ascertained diagnosis of LTCs using both primary-care linked data (available for 45% of participants, censoring date: 28/02/2016 - Read v2 and CTV3 codes, truncated to 5 bytes) and hospital inpatient diagnoses (available for all participants, censoring date: 31/10/2022 - ICD-10 codes). Participants were genotyped using two near identical (>95% shared variants, n=805,426 total) microarray platforms: the Affymetrix Axiom UK Biobank array (in 438,427 participants) and the Affymetrix UKBiLEVE array (in 49,950 participants). UK Biobank centrally performed genotype imputation in 487,442 participants using data from the Haplotype Reference Consortium and UK10K reference panels, increasing the number of genetic variants to ~96 million.8 We exclude genetic variants with <0.1% minor allele frequency or with imputed INFO score <0.3, leaving ~16 million for GWAS analysis. GWAS were performed in up to 451,197 participants genetically similar to the 1000 Genomes EUR population (described previously.9 In brief, individuals from the UK Biobank were projected into the 1000 Genomes principal component (PC) space using the SNP loadings derived from the initial PC analysis to minimise confounding of PC values due to varying degrees of relatedness within UK Biobank.10 Using the means derived from the 1000 Genomes reference dataset, we subsequently performed K-means clustering analyses to determine which individuals from UK Biobank could be classified as EUR-like. GWAS were performed in UKB participants genetically similar to the 1000 Genomes EUR reference population for 84 LTCs, using the same clinical code lists as above in CPRD, using the REGENIE software (v3.1.3) to account for population structure and relatedness, adjusted for age at baseline assessment, sex, genotyping chip, and assessment centre. 11 For quality control, we restricted variants to those with a minor allele frequency (MAF) of >0.1%, and an imputation INFO score =0.3. FinnGen dataFinnGen is a large-scale genomics initiative, that contains data from over 500,000 participants and is linked to health diagnosis data. GWAS summary statistics from the FinnGen cohort (release 9) with 377,277 participants, provided for predetermined disease ("endpoints"), defined using ICD-10-FM (Finnish Modification). 12 Disease-specific GWASDisease-specific GWAS meta-analyses summary statistics when available for each LTC. We used the GWAS Catalog (https://www.ebi.ac.uk/gwas), 13 disease-specific public repositories and contacted authors of the latest GWAS to identify relevant studies with aligned disease definitions and participants of European ancestry to enable comparison with UKB and FinnGen. The below LTCs had available published and available GWAS summary statistics and were used in the genetics analysis (see Supplementary Table 1 for further information).• Anxiety disorders.14• Asthma.15• Atrial fibrillation.16• Chronic kidney disease.17• Chronic obstructive pulmonary disease.18• Coronary heart disease.19• Depression.20• Erectile dysfunction.21• Gastro-oesophageal reflux disease.22• Glaucoma.23• Gout.24• Hearing loss.25• Heart failure.26• Hyperthyroidism, hypothyroidism.27• Irritable bowel syndrome.28• Migraine.29• Osteoarthritis.30• Primary breast malignancy.31• Rheumatoid arthritis.32• Schizophrenia, schizotypal and delusional disorders.33• Type 2 diabetes.34• Ulcerative colitis.35 GWAS meta-analysis For the 72 conditions meeting the heritability criteria above, we meta-analysed genome-wide summary data from up to 3 data sources - UKB, FinnGen and disease-specific GWAS (referred to as Consortium data). See Supplementary Figure 2 for analysis flowchart, and Supplementary Table 1 for effective sample size and other information. A cross-trait LD-score regression framework, that estimates the within-condition, between-dataset genetic correlation, measured the similarity between conditions. 40 The FinnGen and Consortium data were added to the meta-analysis when within-condition genetic correlation (R_g) with UK Biobank was >0.8. Where consortium data included UK Biobank or FinnGen data, the consortium data was used to avoid overlapping datasets (i.e., if UKB was in the consortium GWAS, then we only meta-analysed consortium+FinnGen). Studies were meta-analysed using GWAMA. 41 References1 Amell A, Roso-Llorach A, Palomero L, et al. Disease networks identify specific conditions and pleiotropy influencing multimorbidity in the general population. Sci Rep 2018; 8: 15970.2 Fadason T, Schierding W, Lumley T, O'Sullivan JM. Chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities. Nat Commun 2018; 9: 5198.3 Dong G, Feng J, Sun F, Chen J, Zhao X-M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Med 2021; 13: 110.4 Kim S-S, Hudgins AD, Gonzalez B, et al. A Compendium of Age-Related PheWAS and GWAS Traits for Human Genetic Association Studies, Their Networks and Genetic Correlations. Front Genet 2021; 12. DOI:10.3389/fgene.2021.680560.5 West CE, Karim M, Falaguera MJ, et al. Integrative GWAS and co-localisation analysis suggests novel genes associated with age-related multimorbidity. Sci Data 2023; 10: 655.6 Recalde M, Rodríguez C, Burn E, et al. Data Resource Profile: The Information System for Research in Primary Care (SIDIAP). Int J Epidemiol 2022; 51: e324-36.7 Sudlow C, Gallacher J, Allen N, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med 2015; 12: e1001779.8 Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562: 203-9.9 Casanova F, Tian Q, Atkins JL, et al. Iron and risk of dementia: Mendelian randomisation analysis in UK Biobank. J Med Genet 2024; : jmg-2023-109295.10 Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res 2020; 48: D941-7.11 Mbatchou J, Barnard L, Backman J, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 2021; 53: 1097-103.12 Kurki MI, Karjalainen J, Palta P, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023; 613: 508-18.13 Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 2023; 51: D977-85.14 Otowa T, Hek K, Lee M, et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol Psychiatry 2016; 21: 1391-9.15 Olafsdottir TA, Theodors F, Bjarnadottir K, et al. Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. Nat Commun 2020; 11. DOI:10.1038/S41467-019-14144-8.16 Roselli C, Chaffin MD, Weng LC, et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat Genet 2018; 50: 1225-33.17 Wuttke M, Li Y, Li M, et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat Genet 2019; 51: 957-72.18 Sakornsakolpat P, Prokopenko D, Lamontagne M, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet 2019; 51: 494-505.19 Aragam KG, Jiang T, Goel A, et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat Genet 2022; 54: 1803-15.20 Howard DM, Adams MJ, Clarke TK, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci 2019; 22: 343-52.21 Bovijn J, Jackson L, Censin J, et al. GWAS Identifies Risk Locus for Erectile Dysfunction and Implicates Hypothalamic Neurobiology and Diabetes in Etiology. Am J Hum Genet 2019; 104: 157-63.22 An J, Gharahkhani P, Law MH, et al. Gastroesophageal reflux GWAS identifies risk loci that also associate with subsequent severe esophageal diseases. Nat Commun 2019; 10. DOI:10.1038/S41467-019-11968-2.23 Gharahkhani P, Jorgenson E, Hysi P, et al. Genome-wide meta-analysis identifies 127 open-angle glaucoma loci with consistent effect across ancestries. Nat Commun 2021; 12. DOI:10.1038/S41467-020-20851-4.24 Tin A, Marten J, Halperin Kuhns VL, et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat Genet 2019; 51: 1459-74.25 Praveen K, Dobbyn L, Gurski L, et al. Population-scale analysis of common and rare genetic variation associated with hearing loss in adults. Commun Biol 2022; 5: 540.26 Shah S, Henry A, Roselli C, et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat Commun 2020; 11. DOI:10.1038/S41467-019-13690-5.27 Teumer A, Chaker L, Groeneweg S, et al. Genome-wide analyses identify a role for SLC17A4 and AADAT in thyroid hormone regulation. Nat Commun 2018; 9. DOI:10.1038/S41467-018-06356-1.28 Eijsbouts C, Zheng T, Kennedy NA, et al. Genome-wide analysis of 53,400 people with irritable bowel syndrome highlights shared genetic pathways with mood and anxiety disorders. Nat Genet 2021; 53: 1543-52.29 Gormley P, Anttila V, Winsvold BS, et al. Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine. Nat Genet 2016; 48: 856-66.30 Boer CG, Hatzikotoulas K, Southam L, et al. Deciphering osteoarthritis genetics across 826,690 individuals from 9 populations. Cell 2021; 184: 4784-4818.e17.31 Zhang H, Ahearn TU, Lecarpentier J, et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat Genet 2020; 52: 572-81.32 Saevarsdottir S, Stefansdottir L, Sulem P, et al. Multiomics analysis of rheumatoid arthritis yields sequence variants that have large effects on risk of the seropositive subset. Ann Rheum Dis 2022; 81: 1085-95.33 Trubetskoy V, Pardiñas AF, Qi T, et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 2022; 604: 502-8.34 Mahajan A, Spracklen CN, Zhang W, et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat Genet 2022; 54: 560-72.35 De Lange KM, Moutsianas L, Lee JC, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet 2017; 49: 256-61.36 Denaxas S, Gonzalez-Izquierdo A, Direk K, et al. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. Journal of the American Medical Informatics Association 2019; 26: 1545-59.37 Calderón-Larrañaga A, Vetrano DL, Onder G, et al. Assessing and Measuring Chronic Multimorbidity in the Older Population: A Proposal for Its Operationalization. J Gerontol A Biol Sci Med Sci 2016; : glw233.38 Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM. Concepts, estimation and interpretation of SNP-based heritability. Nature Genetics 2017 49:9 2017; 49: 1304-10.39 Bulik-Sullivan BK, Loh P-R, Finucane HK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 2015; 47: 291-5.40 Bulik-Sullivan B, Finucane HK, Anttila V, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 2015; 47: 1236-41.41 Mägi R, Morris AP. GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics 2010; 11: 288. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | A set of meta analysed GWAS stats not available at EGA because they are a meta analysis. |
| URL | https://zenodo.org/doi/10.5281/zenodo.14284046 |
| Title | GEMINI software to perform genetic subtraction technique |
| Description | R based software to use GWAS summary stats to "subtract" the genetic effects of a risk factor from the genetic correlation of a pair of conditions. |
| Type Of Material | Data analysis technique |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | publication under review. publically available software |
| URL | https://github.com/GEMINI-multimorbidity/partialLDSC |
