A systems approach to the classification of genes impacting the cardiovascular phenome
Lead Research Organisation:
University of Bristol
Department Name: Social Medicine
Abstract
Recent research has been very successful in identifying genetic factors involved in common diseases like heart disease, obesity and diabetes. The majority of findings are based on the association of a gene with a single disease or characteristic, which works well for genes that are very variable between people. Our proposal is to identify some of the genes that are less variable between people, by using multiple characteristics to identify groups of genes that are involved in particular diseases. By looking for patterns in several characteristics at the same time we aim to identify disease fingerprints in known disease genes, and use those fingerprints to find new disease genes. The newly identified genes will include potential candidates for drug development and diagnostic tests.
Technical Summary
Genome-wide association studies have transformed the world of genetic association studies, robustly identifying hundreds of loci involved in a wide range of common, complex diseases and traits. This approach tests each genetic variant against a single trait or disease. However, the typically small effect size (per genetic variant) on traits means that the importance of a particular gene in disease may be underestimated or overlooked due to an absence of genetic variants of major effect on the function of that gene (GWAS results only reflect function that is altered by genetic variation). We propose an alternative approach that uses the pattern of effect across several relevant phenotypes (a phenotypic profile or fingerprint ), rather than the magnitude of effect for a single phenotype, to classify which genes in the genome are functionally involved in a particular trait. By training a supervised learning classifier with phenotypic effect vectors from genes of known relevance we aim to be able to classify all other genes, thus identifying new causal pathways and potential drug targets. This systems approach of utilising multiple related phenotypes (eg the coagulation cascade) will provide new insights into genetic pathways and interactions not accessible within single SNP/single trait analyses. The expected outcomes will be novel, generalisable approaches to the identification of disease genes and classification of new genes involved in cardiovascular disease risk.
Publications

Seoane JA
(2014)
A pathway-based data integration framework for prediction of disease progression.
in Bioinformatics (Oxford, England)

Ferlaino M
(2017)
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.
in BMC bioinformatics

Seoane JA
(2014)
Canonical correlation analysis for gene-based pleiotropy discovery.
in PLoS computational biology

Rogers MF
(2017)
CScape: a tool for predicting oncogenic single-point mutations in the cancer genome.
in Scientific reports

Rogers MF
(2018)
FATHMM-XF: accurate prediction of pathogenic point mutations via extended features.
in Bioinformatics (Oxford, England)

Gaunt TR
(2013)
Gene-centric association signals for haemostasis and thrombosis traits identified with the HumanCVD BeadChip.
in Thrombosis and haemostasis

Shihab HA
(2017)
GTB - an online genome tolerance browser.
in BMC bioinformatics

Shihab HA
(2017)
HIPred: an integrative approach to predicting haploinsufficient genes.
in Bioinformatics (Oxford, England)

Erzurumluoglu AM
(2015)
Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process.
in BioMed research international

Erzurumluoglu A
(2016)
Importance of Genetic Studies in Consanguineous Populations for the Characterization of Novel Human Gene Functions Consanguineous Populations and Genetics
in Annals of Human Genetics

Würtz P
(2015)
Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts.
in Circulation

Shihab HA
(2013)
Predicting the functional consequences of cancer-associated amino acid substitutions.
in Bioinformatics (Oxford, England)

Shihab HA
(2013)
Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.
in Human mutation

Shihab HA
(2014)
Ranking non-synonymous single nucleotide polymorphisms based on disease concepts.
in Human genomics

Gaunt TR
(2016)
Systematic identification of genetic influences on methylation across the human life course.
in Genome biology

Fernandez-Lozano C
(2016)
Texture analysis in gel electrophoresis images using an integrative kernel-based approach.
in Scientific reports

Fernandez-Lozano C
(2015)
Texture classification using feature selection and kernel-based techniques
in Soft Computing
Title | FATHMM |
Description | Our software and server is capable of predicting the functional effects of protein missense mutations by combining sequence conservation within hidden Markov models (HMMs), representing the alignment of homologous sequences and conserved protein domains, with "pathogenicity weights", representing the overall tolerance of the protein/domain to mutations. |
Type Of Technology | Software |
Year Produced | 2013 |
Open Source License? | Yes |
Impact | The software has been implemented by COSMIC (Catalogue of somatic mutations in cancer) and as an add-in for the widely used ANNOVAR tool. Three publications with different variants of the algorithm: Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65 Shihab HA, Gough J, Cooper DN, Day INM, Gaunt, TR. (2013). Predicting the Functional Consequences of Cancer-Associated Amino Acid Substitutions. Bioinformatics 29:1504-1510. Shihab HA, Gough J, Mort M, Cooper DN, Day INM, Gaunt, TR. (2014). Ranking Non-Synonymous Single Nucleotide Polymorphisms based on Disease Concepts. Human Genomics, 8:11 |
URL | http://fathmm.biocompute.org.uk/ |
Title | FSMKL |
Description | The software provides multiple-kernel learning (MKL) with feature selection, and has been applied by us in the context of predicting cancer outcomes using combinations of molecular, pathway and clinical information. |
Type Of Technology | Software |
Year Produced | 2013 |
Open Source License? | Yes |
Impact | Published in Bioinformatics (Bioinformatics. 2014 Mar 15;30(6):838-45. doi: 10.1093/bioinformatics/btt610) |
URL | https://github.com/jseoane/FSMKL |