Exploiting the protein-protein interaction network to identify common genetic variants associated with complex diseases
Lead Research Organisation:
Imperial College London
Department Name: Life Sciences
Abstract
Thanks to recent advances in technology, it is nowadays possible to analyse our genetic material, the DNA. The DNA of two persons differs by approximately 4.5 million small genetic variants, which are the likely cause of the different predisposition to important diseases such as cancer, diabetes and heart disorders. It is, thus, important to understand which of these numerous variants are responsible for disease and the different response to therapies. This cannot be achieved with traditional biology experiments, but requires the use of complex computer algorithms. This project aims at developing such a computer program, which combines biological, mathematical and statistical tools with the vast amount of genetic and biological information available on the internet and the power of modern computers. In particular, the proposed study aims to evaluate the effect of genetic variants on health, by studying how these affect the structure and interaction between proteins, thus, causing disease. We will focus on cardiovascular disorders, diabetes, obesity and cancer which affect millions of people in the UK alone, but the method created will be applicable to any human disease. This computer program will be able to identify potentially damaging DNA variations and will represent an invaluable tool for researchers around the world who are engaged in the effort of understanding and treating diseases.
Technical Summary
The project aims to:
1) perform in-depth analysis of nsSNPs at protein interfaces;
2) develop an algorithm to prioritise and characterise nsSNPs, by combining data from the interactome, with those from currently available structure and sequence-based tools;
3) implement this algorithm to identify new common nsSNPs associated with complex diseases.
The following hypotheses will be addressed:
1) a structural system biology method can enhance the number of identified deleterious nsSNPs, compared to standard sequence and structure methods;
2) the location of nsSNPs on different interfaces of the same can explain protein pleiotropism;
3) a predictive algorithm based on the functional effects of nsSNP can be developed, maximising the ability of GWAS to identify disease SNPs;
4) the algorithm can help identify new common nsSNPs associated with cardiovascular and metabolic disorders in Indian-Asians and in cancer.
The study comprises of four work packages:
1) Analysis of nsSNPs located at interfaces: this will involve characterisation of wild type residues and nsSNPs by means of structure, sequence, biophysical and structural system biology methods. A dedicated database will be created to disseminate results;
2) Manual analysis of nsSNPs strongly associated with disease, from the LOLIPOP study;
3) Development of a prediction algorithm and dedicated web server to predict deleterious nsSNPs. The performance of this method will be evaluated using nsSNPs annotated in Uniprot, for both training and testing, with a support vector machine and a 5-fold cross-validation;
4) Application of the prediction algorithm for the analysis of common nsSNPs identified in the LOLIPOP study.
The project will generate invaluable information on the genetics of cardiovascular and metabolic disorders. The prediction algorithm will be freely available for implementation in any disease, thus, aiding the medical community to develop clinical algorithms for risk stratification.
1) perform in-depth analysis of nsSNPs at protein interfaces;
2) develop an algorithm to prioritise and characterise nsSNPs, by combining data from the interactome, with those from currently available structure and sequence-based tools;
3) implement this algorithm to identify new common nsSNPs associated with complex diseases.
The following hypotheses will be addressed:
1) a structural system biology method can enhance the number of identified deleterious nsSNPs, compared to standard sequence and structure methods;
2) the location of nsSNPs on different interfaces of the same can explain protein pleiotropism;
3) a predictive algorithm based on the functional effects of nsSNP can be developed, maximising the ability of GWAS to identify disease SNPs;
4) the algorithm can help identify new common nsSNPs associated with cardiovascular and metabolic disorders in Indian-Asians and in cancer.
The study comprises of four work packages:
1) Analysis of nsSNPs located at interfaces: this will involve characterisation of wild type residues and nsSNPs by means of structure, sequence, biophysical and structural system biology methods. A dedicated database will be created to disseminate results;
2) Manual analysis of nsSNPs strongly associated with disease, from the LOLIPOP study;
3) Development of a prediction algorithm and dedicated web server to predict deleterious nsSNPs. The performance of this method will be evaluated using nsSNPs annotated in Uniprot, for both training and testing, with a support vector machine and a 5-fold cross-validation;
4) Application of the prediction algorithm for the analysis of common nsSNPs identified in the LOLIPOP study.
The project will generate invaluable information on the genetics of cardiovascular and metabolic disorders. The prediction algorithm will be freely available for implementation in any disease, thus, aiding the medical community to develop clinical algorithms for risk stratification.
Planned Impact
Multigenic disorders, such as cancer, metabolic and cardiovascular diseases are major contributors to morbidity and mortality. By 2025, over 4 million people in the UK will have been diagnosed with diabetes mellitus alone.
New genetic variants associated with a wide range of multigenic disorders are generated continuously by genome-wide association studies, exome sequencing and whole genome sequencing. The proposed project aims to investigate the role of these genetic variants in the development and predisposition to disease, by assessing the genotype-phenotype relationship. This will help to establish new and improve existing clinical algorithms for risk stratification and the identification of targets for therapy. Patients with a variety of disorders, such as cancer, metabolic and cardiovascular, are potential long term beneficiaries of the proposed research.
Short term potential beneficiaries of this research are Indian-Asian populations and those of Indian-Asian descent living outside the Indian subcontinent. These populations, which make up over a fifth of the world's inhabitants, have a staggeringly high incidence of diabetes and cardiovascular-related death. Several epidemiological studies have established that Indian-Asians have a 2- to 3-fold higher incidence of diabetes mellitus and hypertension and a 40% increased risk of death from cardiovascular disorders compared to Caucasians. It is estimated that 40-60% of cardiovascular patients worldwide are of Indian-Asian ethnicity. Analysis of nsSNPs associated with metabolic and cardiovascular disorders in Asians is one of the main aims of this project and will help us to identify the reasons behind the increased genetic predisposition in this population and to develop clinical risk stratification algorithms.
In the long term, the Health Care System can be seen as a potential beneficiary of the results of the proposed research. The cost of Diabetes alone on the NHS has been calculated of over 1.5 million per hour or 8.4% of the NHS budget annually (NHS information Centre report). The results generated by this project can aid in the long term in the development of personalised (stratified) medicine, in which therapies are tailored to the patients' genetic characteristics. This goal is among those most promoted by the UK Government, aiming at improving the efficacy of therapies and reducing costs to the NHS.
Despite several recent significant discoveries in modern genetics, a major gap still exists between human genome sequencing and understanding the pathogenesis of human diseases. This is greatly due to the lack in comprehension of the mechanisms by which SNPs impact on the phenotype and single protein mutations can have consequences on different biological systems (pleiotropism). The proposed project aims to contribute in filling this gap, by performing a novel interface nsSNP analysis and by developing a new tool for nsSNP prediction.
New genetic variants associated with a wide range of multigenic disorders are generated continuously by genome-wide association studies, exome sequencing and whole genome sequencing. The proposed project aims to investigate the role of these genetic variants in the development and predisposition to disease, by assessing the genotype-phenotype relationship. This will help to establish new and improve existing clinical algorithms for risk stratification and the identification of targets for therapy. Patients with a variety of disorders, such as cancer, metabolic and cardiovascular, are potential long term beneficiaries of the proposed research.
Short term potential beneficiaries of this research are Indian-Asian populations and those of Indian-Asian descent living outside the Indian subcontinent. These populations, which make up over a fifth of the world's inhabitants, have a staggeringly high incidence of diabetes and cardiovascular-related death. Several epidemiological studies have established that Indian-Asians have a 2- to 3-fold higher incidence of diabetes mellitus and hypertension and a 40% increased risk of death from cardiovascular disorders compared to Caucasians. It is estimated that 40-60% of cardiovascular patients worldwide are of Indian-Asian ethnicity. Analysis of nsSNPs associated with metabolic and cardiovascular disorders in Asians is one of the main aims of this project and will help us to identify the reasons behind the increased genetic predisposition in this population and to develop clinical risk stratification algorithms.
In the long term, the Health Care System can be seen as a potential beneficiary of the results of the proposed research. The cost of Diabetes alone on the NHS has been calculated of over 1.5 million per hour or 8.4% of the NHS budget annually (NHS information Centre report). The results generated by this project can aid in the long term in the development of personalised (stratified) medicine, in which therapies are tailored to the patients' genetic characteristics. This goal is among those most promoted by the UK Government, aiming at improving the efficacy of therapies and reducing costs to the NHS.
Despite several recent significant discoveries in modern genetics, a major gap still exists between human genome sequencing and understanding the pathogenesis of human diseases. This is greatly due to the lack in comprehension of the mechanisms by which SNPs impact on the phenotype and single protein mutations can have consequences on different biological systems (pleiotropism). The proposed project aims to contribute in filling this gap, by performing a novel interface nsSNP analysis and by developing a new tool for nsSNP prediction.
Publications
Cornish AJ
(2015)
Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types.
in Genome medicine
David A
(2015)
The Contribution of Missense Mutations in Core and Rim Residues of Protein-Protein Interfaces to Human Disease.
in Journal of molecular biology
Howard S
(2015)
Mutations in IGSF10 cause self-limited delayed puberty, via disturbance of GnRH neuronal migration
in Endocrine Abstracts
Howard S
(2016)
Role of IGSF10 mutations in self-limited delayed puberty
in The Lancet
Howard S
(2015)
Mutations in IGSF10 cause self-limited delayed puberty
in Endocrine Abstracts
Howard Sasha
(2016)
Role of
IGSF10 mutations in self-limited delayed puberty
in LANCET
Howard SR
(2018)
Contributions of Function-Altering Variants in Genes Implicated in Pubertal Timing and Body Mass for Self-Limited Delayed Puberty.
in The Journal of clinical endocrinology and metabolism
Howard SR
(2016)
IGSF10 mutations dysregulate gonadotropin-releasing hormone neuronal migration resulting in delayed puberty.
in EMBO molecular medicine
Ittisoponpisan S
(2017)
Landscape of Pleiotropic Proteins Causing Human Disease: Structural and System Biology Insights.
in Human mutation
Metherell L
(2016)
Structural analysis of nicotinamide nucleotide transhydrogenase (NNT) genetic variants causing adrenal disorders
in Endocrine Abstracts
Title | Cover for EMBO journal |
Description | Cover for EMBO journal |
Type Of Art | Artwork |
Year Produced | 2016 |
Impact | The cover on the EMBO journal helps promoting our manuscript on the discovery of a novel gene causing pubertal disorders. |
URL | http://embomolmed.embopress.org/content/8/6.cover-expansion |
Title | R package |
Description | We developed an R package called DiseaseCellTypes that contains the "gene set compactness" and the "gene set overexpression" methods that can be used to create cell-type specific interactomes. |
Type Of Material | Data analysis technique |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | These methods were used to developed a research into exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types. The results of this research were published in Genome Medicine in 2015. The R package is freely available for download to the scientific community. |
URL | http://alexjcornish.github.io/DiseaseCellTypes/ |
Title | cell-type-specific gene networks |
Description | We developed 73 cell-type-specific gene networks using data from protein-protein interaction networks and disease-causing genetic variations. |
Type Of Material | Database/Collection of data |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | These cell-specific interactomes were used to developed a research into exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types. The results of this research were published in Genome Medicine in 2015. The cell-type-specific interactomes are freely available for download to the scientific community. |
URL | http://alexjcornish.github.io/Cell_Type_Interactomes/ |
Title | database of pleiotropic proteins and dedicated website |
Description | This database collects all data used for the analysis of pleiotropic proteins causing human disease published in Human Mutation as an original article. Through a dedicated website, the Users can easily search and download data. This database is publicly available. |
Type Of Material | Database/Collection of data |
Provided To Others? | No |
Impact | This database and its dedicated website make the dataset used for the analysis available to other groups, who can reproduce our results or use our dataset for further analysis. This increases transparency and foster collaborations |
URL | http://www.sbg.bio.ic.ac.uk/pleiotropydb/home/ |
Description | Characterization of a novel mutation in patients with endocrine disorders |
Organisation | Queen Mary University of London |
Department | William Harvey Research Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | In silico bioinformatic analysis of a novel genetic mutation and prediction of its consequences on the protein structure and function. |
Collaborator Contribution | genetic analysis and in vitro studies |
Impact | one scientific manuscript in preparation. |
Start Year | 2015 |
Description | Identification and characterization of novel genes involved in delayed puberty |
Organisation | Queen Mary University of London |
Department | William Harvey Research Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I have performed the bioinformatic analysis of novel, uncharacterized genes identified through the genetic analysis of patients with an endocrine disorder. The analysis involved modeling of the three-dimensional structure of uncharacterized protein and understanding of the genotype-phenotype relationship of its disease-causing mutations. |
Collaborator Contribution | DNA sequencing and in vitro studies |
Impact | two scientific manuscripts under review two scientific manuscripts in preparation three abstracts presented in international medical conferences |
Start Year | 2015 |
Description | prioritization of genetic variants involved in immune disorders |
Organisation | Imperial College London |
Department | National Heart & Lung Institute (NHLI) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Bioinformatic analysis of the genetic results obtained from DNA sequencing of patients with immune disorders. The analysis included modeling the three-dimensional structure of target proteins and structural characterization and prioritization of novel genetic variants. |
Collaborator Contribution | Sequencing of patients' DNA. |
Impact | I have been made formal collaborator on a Wellcome Trust grant which was recently submitted. |
Start Year | 2016 |
Description | Imperial College Science Festival |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | The Imperial Science Festival is an annual event for the lay public. Generally more than 10,000 people attend the Festival every year. This was a platform to present my research to the lay public ranging from primary school children to adults. It was also a platform to talk to young school girls about a career in STEM disciplines. |
Year(s) Of Engagement Activity | 2015,2016 |
URL | http://n.a. |
Description | Work experience for school children |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | The Department of Bioinformatics and System Biology provides week-long work experience for small groups of school children. Over the last two years, I have taken this opportunity to engage with 14-17 year-old students and present the results of my research. One or two children expressed an interest in pursuing a degree in biomedical sciences at University. |
Year(s) Of Engagement Activity | 2014,2015,2016 |