A systems approach to the classification of genes impacting the cardiovascular phenome

Lead Research Organisation: University of Bristol
Department Name: Social Medicine


Recent research has been very successful in identifying genetic factors involved in common diseases like heart disease, obesity and diabetes. The majority of findings are based on the association of a gene with a single disease or characteristic, which works well for genes that are very variable between people. Our proposal is to identify some of the genes that are less variable between people, by using multiple characteristics to identify groups of genes that are involved in particular diseases. By looking for patterns in several characteristics at the same time we aim to identify disease fingerprints in known disease genes, and use those fingerprints to find new disease genes. The newly identified genes will include potential candidates for drug development and diagnostic tests.

Technical Summary

Genome-wide association studies have transformed the world of genetic association studies, robustly identifying hundreds of loci involved in a wide range of common, complex diseases and traits. This approach tests each genetic variant against a single trait or disease. However, the typically small effect size (per genetic variant) on traits means that the importance of a particular gene in disease may be underestimated or overlooked due to an absence of genetic variants of major effect on the function of that gene (GWAS results only reflect function that is altered by genetic variation). We propose an alternative approach that uses the pattern of effect across several relevant phenotypes (a phenotypic profile or fingerprint ), rather than the magnitude of effect for a single phenotype, to classify which genes in the genome are functionally involved in a particular trait. By training a supervised learning classifier with phenotypic effect vectors from genes of known relevance we aim to be able to classify all other genes, thus identifying new causal pathways and potential drug targets. This systems approach of utilising multiple related phenotypes (eg the coagulation cascade) will provide new insights into genetic pathways and interactions not accessible within single SNP/single trait analyses. The expected outcomes will be novel, generalisable approaches to the identification of disease genes and classification of new genes involved in cardiovascular disease risk.
Description Our software and server is capable of predicting the functional effects of protein missense mutations by combining sequence conservation within hidden Markov models (HMMs), representing the alignment of homologous sequences and conserved protein domains, with "pathogenicity weights", representing the overall tolerance of the protein/domain to mutations. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact The software has been implemented by COSMIC (Catalogue of somatic mutations in cancer) and as an add-in for the widely used ANNOVAR tool. Three publications with different variants of the algorithm: Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65 Shihab HA, Gough J, Cooper DN, Day INM, Gaunt, TR. (2013). Predicting the Functional Consequences of Cancer-Associated Amino Acid Substitutions. Bioinformatics 29:1504-1510. Shihab HA, Gough J, Mort M, Cooper DN, Day INM, Gaunt, TR. (2014). Ranking Non-Synonymous Single Nucleotide Polymorphisms based on Disease Concepts. Human Genomics, 8:11 
URL http://fathmm.biocompute.org.uk/
Title FSMKL 
Description The software provides multiple-kernel learning (MKL) with feature selection, and has been applied by us in the context of predicting cancer outcomes using combinations of molecular, pathway and clinical information. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact Published in Bioinformatics (Bioinformatics. 2014 Mar 15;30(6):838-45. doi: 10.1093/bioinformatics/btt610) 
URL https://github.com/jseoane/FSMKL