Exploiting the protein-protein interaction network to identify common genetic variants associated with complex diseases

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Thanks to recent advances in technology, it is nowadays possible to analyse our genetic material, the DNA. The DNA of two persons differs by approximately 4.5 million small genetic variants, which are the likely cause of the different predisposition to important diseases such as cancer, diabetes and heart disorders. It is, thus, important to understand which of these numerous variants are responsible for disease and the different response to therapies. This cannot be achieved with traditional biology experiments, but requires the use of complex computer algorithms. This project aims at developing such a computer program, which combines biological, mathematical and statistical tools with the vast amount of genetic and biological information available on the internet and the power of modern computers. In particular, the proposed study aims to evaluate the effect of genetic variants on health, by studying how these affect the structure and interaction between proteins, thus, causing disease. We will focus on cardiovascular disorders, diabetes, obesity and cancer which affect millions of people in the UK alone, but the method created will be applicable to any human disease. This computer program will be able to identify potentially damaging DNA variations and will represent an invaluable tool for researchers around the world who are engaged in the effort of understanding and treating diseases.

Technical Summary

The project aims to:
1) perform in-depth analysis of nsSNPs at protein interfaces;
2) develop an algorithm to prioritise and characterise nsSNPs, by combining data from the interactome, with those from currently available structure and sequence-based tools;
3) implement this algorithm to identify new common nsSNPs associated with complex diseases.

The following hypotheses will be addressed:
1) a structural system biology method can enhance the number of identified deleterious nsSNPs, compared to standard sequence and structure methods;
2) the location of nsSNPs on different interfaces of the same can explain protein pleiotropism;
3) a predictive algorithm based on the functional effects of nsSNP can be developed, maximising the ability of GWAS to identify disease SNPs;
4) the algorithm can help identify new common nsSNPs associated with cardiovascular and metabolic disorders in Indian-Asians and in cancer.

The study comprises of four work packages:
1) Analysis of nsSNPs located at interfaces: this will involve characterisation of wild type residues and nsSNPs by means of structure, sequence, biophysical and structural system biology methods. A dedicated database will be created to disseminate results;
2) Manual analysis of nsSNPs strongly associated with disease, from the LOLIPOP study;
3) Development of a prediction algorithm and dedicated web server to predict deleterious nsSNPs. The performance of this method will be evaluated using nsSNPs annotated in Uniprot, for both training and testing, with a support vector machine and a 5-fold cross-validation;
4) Application of the prediction algorithm for the analysis of common nsSNPs identified in the LOLIPOP study.

The project will generate invaluable information on the genetics of cardiovascular and metabolic disorders. The prediction algorithm will be freely available for implementation in any disease, thus, aiding the medical community to develop clinical algorithms for risk stratification.

Planned Impact

Multigenic disorders, such as cancer, metabolic and cardiovascular diseases are major contributors to morbidity and mortality. By 2025, over 4 million people in the UK will have been diagnosed with diabetes mellitus alone.
New genetic variants associated with a wide range of multigenic disorders are generated continuously by genome-wide association studies, exome sequencing and whole genome sequencing. The proposed project aims to investigate the role of these genetic variants in the development and predisposition to disease, by assessing the genotype-phenotype relationship. This will help to establish new and improve existing clinical algorithms for risk stratification and the identification of targets for therapy. Patients with a variety of disorders, such as cancer, metabolic and cardiovascular, are potential long term beneficiaries of the proposed research.
Short term potential beneficiaries of this research are Indian-Asian populations and those of Indian-Asian descent living outside the Indian subcontinent. These populations, which make up over a fifth of the world's inhabitants, have a staggeringly high incidence of diabetes and cardiovascular-related death. Several epidemiological studies have established that Indian-Asians have a 2- to 3-fold higher incidence of diabetes mellitus and hypertension and a 40% increased risk of death from cardiovascular disorders compared to Caucasians. It is estimated that 40-60% of cardiovascular patients worldwide are of Indian-Asian ethnicity. Analysis of nsSNPs associated with metabolic and cardiovascular disorders in Asians is one of the main aims of this project and will help us to identify the reasons behind the increased genetic predisposition in this population and to develop clinical risk stratification algorithms.
In the long term, the Health Care System can be seen as a potential beneficiary of the results of the proposed research. The cost of Diabetes alone on the NHS has been calculated of over 1.5 million per hour or 8.4% of the NHS budget annually (NHS information Centre report). The results generated by this project can aid in the long term in the development of personalised (stratified) medicine, in which therapies are tailored to the patients' genetic characteristics. This goal is among those most promoted by the UK Government, aiming at improving the efficacy of therapies and reducing costs to the NHS.
Despite several recent significant discoveries in modern genetics, a major gap still exists between human genome sequencing and understanding the pathogenesis of human diseases. This is greatly due to the lack in comprehension of the mechanisms by which SNPs impact on the phenotype and single protein mutations can have consequences on different biological systems (pleiotropism). The proposed project aims to contribute in filling this gap, by performing a novel interface nsSNP analysis and by developing a new tool for nsSNP prediction.

Publications

10 25 50
 
Title Cover for EMBO journal 
Description Cover for EMBO journal 
Type Of Art Artwork 
Year Produced 2016 
Impact The cover on the EMBO journal helps promoting our manuscript on the discovery of a novel gene causing pubertal disorders. 
URL http://embomolmed.embopress.org/content/8/6.cover-expansion
 
Title R package 
Description We developed an R package called DiseaseCellTypes that contains the "gene set compactness" and the "gene set overexpression" methods that can be used to create cell-type specific interactomes. 
Type Of Material Data analysis technique 
Year Produced 2015 
Provided To Others? Yes  
Impact These methods were used to developed a research into exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types. The results of this research were published in Genome Medicine in 2015. The R package is freely available for download to the scientific community. 
URL http://alexjcornish.github.io/DiseaseCellTypes/
 
Title cell-type-specific gene networks 
Description We developed 73 cell-type-specific gene networks using data from protein-protein interaction networks and disease-causing genetic variations. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact These cell-specific interactomes were used to developed a research into exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types. The results of this research were published in Genome Medicine in 2015. The cell-type-specific interactomes are freely available for download to the scientific community. 
URL http://alexjcornish.github.io/Cell_Type_Interactomes/
 
Title database of pleiotropic proteins and dedicated website 
Description This database collects all data used for the analysis of pleiotropic proteins causing human disease published in Human Mutation as an original article. Through a dedicated website, the Users can easily search and download data. This database is publicly available. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact This database and its dedicated website make the dataset used for the analysis available to other groups, who can reproduce our results or use our dataset for further analysis. This increases transparency and foster collaborations 
URL http://www.sbg.bio.ic.ac.uk/pleiotropydb/home/
 
Description Characterization of a novel mutation in patients with endocrine disorders 
Organisation Queen Mary University of London
Department William Harvey Research Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution In silico bioinformatic analysis of a novel genetic mutation and prediction of its consequences on the protein structure and function.
Collaborator Contribution genetic analysis and in vitro studies
Impact one scientific manuscript in preparation.
Start Year 2015
 
Description Identification and characterization of novel genes involved in delayed puberty 
Organisation Queen Mary University of London
Department William Harvey Research Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution I have performed the bioinformatic analysis of novel, uncharacterized genes identified through the genetic analysis of patients with an endocrine disorder. The analysis involved modeling of the three-dimensional structure of uncharacterized protein and understanding of the genotype-phenotype relationship of its disease-causing mutations.
Collaborator Contribution DNA sequencing and in vitro studies
Impact two scientific manuscripts under review two scientific manuscripts in preparation three abstracts presented in international medical conferences
Start Year 2015
 
Description prioritization of genetic variants involved in immune disorders 
Organisation Imperial College London
Department National Heart & Lung Institute (NHLI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Bioinformatic analysis of the genetic results obtained from DNA sequencing of patients with immune disorders. The analysis included modeling the three-dimensional structure of target proteins and structural characterization and prioritization of novel genetic variants.
Collaborator Contribution Sequencing of patients' DNA.
Impact I have been made formal collaborator on a Wellcome Trust grant which was recently submitted.
Start Year 2016
 
Description Imperial College Science Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Imperial Science Festival is an annual event for the lay public. Generally more than 10,000 people attend the Festival every year. This was a platform to present my research to the lay public ranging from primary school children to adults. It was also a platform to talk to young school girls about a career in STEM disciplines.
Year(s) Of Engagement Activity 2015,2016
URL http://n.a.
 
Description Work experience for school children 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact The Department of Bioinformatics and System Biology provides week-long work experience for small groups of school children. Over the last two years, I have taken this opportunity to engage with 14-17 year-old students and present the results of my research. One or two children expressed an interest in pursuing a degree in biomedical sciences at University.
Year(s) Of Engagement Activity 2014,2015,2016