Systematic characterisation of genetically influenced 'omics' phenotypes and disease modules within biological networks

Lead Research Organisation: University of Cambridge
Department Name: Public Health and Primary Care

Abstract

Introduction: Identification of biological pathways associated with diseases and functional characterisation of changes that perturb biological processes is the key to understanding disease aetiology, prognosis and prevention. Recent studies have been successful in identifying the association of genetic variants and potentially causal genes with various proteins, metabolites and lipids and their influence on key biological pathways that are associated with diseases [1-6]. In addition, there is increasing evidence of genetic overlap between unrelated diseases and traits that point to a shared aetiology of diseases [7-8]. The aim of the proposed project is to understand the shared cause of diseases by combining multi-omics (proteins, metabolites and lipids) phenotype data, genetic association with multi-omics phenotypes and diseases and electronic health records (EHR) within the INTERVAL Bioresource.

Background and Aim: Analyses of the vast amount of data from these high dimensional analyses is often difficult without constructing an interactive biological network. Gaussian Graphical Modelling (GGM) allows the construction of a biological (omics) network, and an automated feature detection algorithm will enable the extraction of disease modules from this network. Here disease module refers to a set of proteins/lipids/metabolites and associated pathways that are picked by the algorithm as associated with diseases (eg. CHD, Type II Diabetes etc). Further functional characterisation and computational follow-up of these disease modules using clinical measures in EHR will lead to the identification of novel genes and pathways associated with diseases. This will also improve our understanding of the shared cause of diseases. ie. if there's a change in the gene that's associated with reduced risk of CHD and T2D but an increase in Asthma.

The project will primarily focus on 1. Developing an interactive web resource that will allow the investigation of phenotype data and genetic association with the phenotypes measured within the INTERVAL bioresource. 2. Developing a supervised learning method to identify disease modules within the biological network 3. Investigate disease modules and underlying biological pathways using electronic health records (EHR) to understand shared aetiology of diseases

Methods: The proposed work will focus on constructing GGMs for the multi-omics data (proteins, metabolites and lipids) with edges representing the partial correlation between two phenotype measures conditioned for other variables within the model. Meta-data which include genetic associations with multi-omics phenotypes from Genome-Wide Association Studies (GWAS), genetic association with diseases from public databases (PhenoScanner, SNiPA etc.) and biological pathways (KEGG) will be added to the network to allow the identification of molecular pathways and disease modules using supervised learning methods. This will be particularly useful as an automated tool to detect disease modules within a biological network and inform mendelian randomisation studies (MR) to understand the shared aetiology of diseases. These genetically influenced disease modules can be tested for association across Electronic Health Record (HER) phenotypes (eg. Diagnosis, presence or absence of multi-morbidity and drug responses). This agnostic approach will provide insights into the influence of perturbations within the omics network on medical phenome and identify key omics phenotypes and pathways that are shared by diseases thereby providing candidate targets for therapeutic intervention.

References: 1. Shin, S.-Y. et al. Nat. Genet. (2014) 2. Long, T. et al. Nat. Genet. (2017) 3. Kettunen, J. et al. Nat. Commun. (2016) 4. Suhre, K. et al. Nat. Commun. (2017) 5. Suhre, K. et al. Nature (2011) 6. Draisma, H. H. M. et al. Nat. Commun. (2015) 7. Sarwar et.al. Lancet. (2012) 8. Ferreira et.al PLoS Genetics (2013)

Technical Summary

Recent studies have been successful in identifying the association of genetic variants and potentially causal genes with various proteins, metabolites and lipids and their influence on key biological pathways that are associated with diseases [1-6]. In addition, there is increasing evidence of genetic overlap between unrelated diseases and traits that point to a shared aetiology of diseases [7-8].
The proposed work will focus on constructing GGMs for the multi-omics data (proteins, metabolites and lipids) with edges representing the partial correlation between two phenotype measures conditioned for other variables within the model. Meta-data which include genetic associations with multi-omics phenotypes from Genome-Wide Association Studies (GWAS), the genetic association with diseases from public databases (PhenoScanner, SNiPA etc.) and biological pathways (KEGG) will be added to the network to allow the identification of molecular pathways and disease modules using supervised learning methods. This will be particularly useful as an automated tool to detect disease modules within a biological network and inform mendelian randomisation studies (MR) to understand the shared aetiology of diseases. These genetically influenced disease modules can be tested for association across Electronic Health Record (HER) phenotypes (eg. Diagnosis, presence or absence of multi-morbidity and drug responses). This agnostic approach will provide insights into the influence of perturbations within the omics network on medical phenome and identify key omics phenotypes and pathways that are shared by diseases thereby providing candidate targets for therapeutic intervention.
References: 1. Shin, S.-Y. et al. Nat.Genet. (2014) 2. Long, T. et al. Nat.Genet. (2017) 3. Kettunen, J. et al. Nat.Comm. (2016) 4. Suhre, K. et al. Nat. Comm. (2017) 5. Suhre, K. et al. Nature (2011) 6. Draisma, H. H. M. et al. Nat. Comm. (2015) 7. Sarwar et.al. Lancet. (2012) 8. Ferreira et.al PLoS Gen. (2013)

Publications

10 25 50

 
Description Cambridge BHF CRE Pump-Priming
Amount £50,000 (GBP)
Organisation University of Cambridge 
Sector Academic/University
Country United Kingdom
Start 11/2019 
End 11/2021
 
Description EPIC-Norfolk 
Organisation Helmholtz Zentrum München
Country Germany 
Sector Academic/University 
PI Contribution Conducted the largest genetic analysis of non targetted metabolomics to date in collaboration with the partners listed above. The outcomes of this research is now being prepared for publication in a high impact journal.
Collaborator Contribution EPIC-Norfolk contributed approximately 12,000 samples to the genetic analysis. Prof. Karsten Suhre provided extensive advise on research, Dr. Gabi Kastenmüller and Dr. Johannes Raffler provided computation biology support to develop a webserver for the dissemination of results.
Impact 2018 Charles J. Epstein Trainee Award for Excellence in Human Genetics Research Finalist for the presentation of the research work - Genetic Architecture of Human Plasma Metabolome. The study identified approximately 2,500 unique genetic variants - blood metabolite association and discovered many novel pathways that are under genetic control.
Start Year 2018
 
Description EPIC-Norfolk 
Organisation University of Cambridge
Department Institute of Metabolic Science (IMS)
Country United Kingdom 
Sector Academic/University 
PI Contribution Conducted the largest genetic analysis of non targetted metabolomics to date in collaboration with the partners listed above. The outcomes of this research is now being prepared for publication in a high impact journal.
Collaborator Contribution EPIC-Norfolk contributed approximately 12,000 samples to the genetic analysis. Prof. Karsten Suhre provided extensive advise on research, Dr. Gabi Kastenmüller and Dr. Johannes Raffler provided computation biology support to develop a webserver for the dissemination of results.
Impact 2018 Charles J. Epstein Trainee Award for Excellence in Human Genetics Research Finalist for the presentation of the research work - Genetic Architecture of Human Plasma Metabolome. The study identified approximately 2,500 unique genetic variants - blood metabolite association and discovered many novel pathways that are under genetic control.
Start Year 2018
 
Description EPIC-Norfolk 
Organisation Vanderbilt University
Country United States 
Sector Academic/University 
PI Contribution Conducted the largest genetic analysis of non targetted metabolomics to date in collaboration with the partners listed above. The outcomes of this research is now being prepared for publication in a high impact journal.
Collaborator Contribution EPIC-Norfolk contributed approximately 12,000 samples to the genetic analysis. Prof. Karsten Suhre provided extensive advise on research, Dr. Gabi Kastenmüller and Dr. Johannes Raffler provided computation biology support to develop a webserver for the dissemination of results.
Impact 2018 Charles J. Epstein Trainee Award for Excellence in Human Genetics Research Finalist for the presentation of the research work - Genetic Architecture of Human Plasma Metabolome. The study identified approximately 2,500 unique genetic variants - blood metabolite association and discovered many novel pathways that are under genetic control.
Start Year 2018
 
Description EPIC-Norfolk 
Organisation Weill Cornell Medical College in Qatar
Country Qatar 
Sector Academic/University 
PI Contribution Conducted the largest genetic analysis of non targetted metabolomics to date in collaboration with the partners listed above. The outcomes of this research is now being prepared for publication in a high impact journal.
Collaborator Contribution EPIC-Norfolk contributed approximately 12,000 samples to the genetic analysis. Prof. Karsten Suhre provided extensive advise on research, Dr. Gabi Kastenmüller and Dr. Johannes Raffler provided computation biology support to develop a webserver for the dissemination of results.
Impact 2018 Charles J. Epstein Trainee Award for Excellence in Human Genetics Research Finalist for the presentation of the research work - Genetic Architecture of Human Plasma Metabolome. The study identified approximately 2,500 unique genetic variants - blood metabolite association and discovered many novel pathways that are under genetic control.
Start Year 2018
 
Description INTERVAL Glycomics pilot-study 
Organisation Genos
Country Croatia 
Sector Private 
PI Contribution Cambridge BHF CRE Pump-Priming obtained by Dr. Praveen Surendran to perform IgG and total plasma N-Glycan measurement in 500 samples from the INTERVAL Bioresource
Collaborator Contribution Measurement of IgG and total plasma N-Glycans - Service
Impact Sample selection is ongoing. Measurements, analyses, and output will be reported in Q1 2021.
Start Year 2020
 
Description Metabolomic Age prediction 
Organisation Leiden University Medical Center
Country Netherlands 
Sector Academic/University 
PI Contribution Joint supervision of PhD project performed by Mr. Tariq Faqui with Dr. Dennis-Mook Kanamori on the project entitled: "Metabolomic Age prediction"
Collaborator Contribution Study Proposal: Metabolomic Age prediction Analyses lead: Mr. Tariq Faquih Supervisors: Dr. Praveen Surendran, Dr. Dennis Mook-Kanamori 1. Background The biological rate of aging is a major risk factor for a multitude of diseases [1, 2] and is a complex and multifactorial process that is influenced by genetic factors, lifestyle influences and environmental factors [3-5]. It is evident that the rate of aging varies between individuals, wherein some individuals are able to live for longer without age-related disability and diseases compared to individuals in the same age group [3]. Human longevity is a genetically heritable trait observed to be clustered within some families [5]. However, despite various genome-wide association studies [6-8], only the APOE and FOXO3A genes have consistently shown strong association with age [5, 9]. Hence, a variety of "-omics" technologies have been utilized to further study aging and identify novel biomarkers that are associated with longevity, [3, 4]. Such -omics approaches used for age studies include transcriptomics studies [10, 11], methylomics/epigenomics [12], and metabolomic studies [4, 13-15]. Metabolomics is the study of the end products and by products of cell metabolism, i.e. metabolites. Thus, metabolomics provides a holistic representation of cell processes and disease phenotype that reflects the influences of both genetic and environmental factors. [16, 17]. Metabolomic research on ageing has demonstrated a correlation between age and alterations in metabolomic profiles in worms, flies, mice, and humans [4]. 2. Objective The objective of this study is to identify a metabolites for prediction of biological age in a large scale cohort with a wide age range. Due to the vast range of ageing effects, we will use Metabolon's (Durham, North Carolina, USA) untargeted metabolomics platform to detect a broad range of endogenous and xenobiotic metabolites from various biochemical pathways. 3. Methods The data will be split to training and testing sets. LASSO or RIDGE regression analysis method will be used to select the metabolites relevant for predicating biological age in the training set followed by 10-fold cross validation. Subsequently the metabolite model will be used with the testing set for evaluation. Due to the large number of metabolites, we believe LASSO or RIDGE would be a suitable method for accommodating the large number of features (metabolites) while providing high prediction accuracy, and for reducing dimensionality and overfitting without inducing too much bias [18]. Finally, we will examine in various external follow-up studies whether the identified set of metabolites is predictive of earlier onset of a wide range disease. 4. Cohorts Involved: Amonst others INTERVAL, CHARGE consortium 5. References 1. North, B.J. and D.A. Sinclair, The intersection between aging and cardiovascular disease. Circ Res, 2012. 110(8): p. 1097-108. 2. Broglio, S.P., et al., Cognitive decline and aging: the role of concussive and subconcussive impacts. 2012. 40(3): p. 138. 3. Lopez-Otin, C., et al., The hallmarks of aging. Cell, 2013. 153(6): p. 1194-217. 4. Hoffman, J.M., et al., Proteomics and metabolomics in ageing research: from biomarkers to systems biology. Essays Biochem, 2017. 61(3): p. 379-388. 5. Brooks-Wilson, A.R.J.H.g., Genetics of healthy aging and longevity. 2013. 132(12): p. 1323-1338. 6. Sebastiani, P., et al., Genetic signatures of exceptional longevity in humans. 2012. 7(1): p. e29848. 7. Nebel, A., et al., A genome-wide association study confirms APOE as the major gene influencing survival in long-lived individuals. 2011. 132(6-7): p. 324-330. 8. Deelen, J., et al., Genome-wide association study identifies a single major locus contributing to survival into old age; the APOE locus revisited. 2011. 10(4): p. 686-698. 9. Perls, T., et al., Exceptional familial clustering for extreme longevity in humans. 2000. 48(11): p. 1483-1485. 10. Peters, M.J., et al., The transcriptional landscape of age in human peripheral blood. Nat Commun, 2015. 6: p. 8570. 11. Passtoors, W.M., et al., Transcriptional profiling of human familial longevity indicates a role for ASF1A and IL7R. PLoS One, 2012. 7(1): p. e27759. 12. Hannum, G., et al., Genome-wide methylation profiles reveal quantitative views of human aging rates. 2013. 49(2): p. 359-367. 13. Menni, C., et al., Metabolomic markers reveal novel pathways of ageing and early development in human populations. 2013. 42(4): p. 1111-1119. 14. Hertel, J., et al., Measuring Biological Age via Metabonomics: The Metabolic Age Score. J Proteome Res, 2016. 15(2): p. 400-10. 15. Martin, F.J., I. Montoliu, and M. Kussmann, Metabonomics of ageing - Towards understanding metabolism of a long and healthy life. Mech Ageing Dev, 2017. 165(Pt B): p. 171-179. 16. Rattray, N.J.W., et al., Beyond genomics: understanding exposotypes through metabolomics. Hum Genomics, 2018. 12(1): p. 4. 17. Alonso, A., S. Marsal, and A. Julia, Analytical methods in untargeted metabolomics: state of the art in 2015. Front Bioeng Biotechnol, 2015. 3: p. 23. 18. Fonti, V. and E. Belitser, Feature selection using LASSO. 2017.
Impact Analyses performed last month and the results are currently being reviewed. A full summary of findings including outcomes will be reported in Q4 2020.
Start Year 2020
 
Description Systematic characterisation of non-alcoholic fatty liver disease loci 
Organisation Pfizer Inc
Country United States 
Sector Private 
PI Contribution Using multi-omics data from INTERVAL Bioresource, I performed the fine-mapping of non-alcoholic fatty liver disease (NAFLD) loci to identify the biomolecular pathways associated with genes identified as associated with the outcome.
Collaborator Contribution Genetic association with NAFLD was obtained from 23andMe through Pfizer's collaboration with 23andMe. Dr. Eric Fauman, Senior Scientific Director at Pfizer is jointly leading the work on biochemical characterization of biomolecular pathways.
Impact The results are currently being reviewed. Details of the outcome will be reported in 2020.
Start Year 2020