Rigorous Information-theoretic tools for Comparative Interactomics.

Lead Research Organisation: King's College London
Department Name: Randall Div of Cell and Molecular Biophy


Molecular signals in the living cell can be in a first approximation mostly attributed to Protein-Protein Interactions (PPI) and their complex cross-talk. Since the recent completion of the Human Genome project, it has now become possible to identify and map a large part of the proteins encoded in our genes. However, more details about the molecular interactions involved in signal transduction pathways need to be uncovered before we can truly understand the complex biology of our cellular system. This represents one of the major challenges for the next years of research in biology and medicine. Molecules signal information through interaction with specific binding partners. The binding induces a conformational change in at least one of the partner molecules, which triggers the next biomolecular step in the signaling cascade. The mechanisms of interaction between proteins are therefore crucial to all biological functions, and the effectiveness of this cross-talk during signal transduction plays a fundamental role in many 'healthy' biological processes and in many diseases (e.g. cancers). Several large-scale experimental studies have been published in recent years, to detect PPIs for diverse species, and have been deposited in publicly available databases. Current experimental techniques, such as yeast two-hybrid (Y2H) and co-affinity purification combined with mass spectrometry (AP-MS), have, however, been shown to samplesubsets of the interaction data space with only very limited overlap. We have recently developed a theoretically sound and accurate mathematical framework for comparing interactome data (PPI networks, PPIN) and to evaluate, in an unbiased way, their distance in terms of macroscopic topological properties. Preliminary analysis revealed that networks of the same species and sampled by the same method are similar, and more similar than networks sampled by the same method but different species. Therefore, networks generated from similar experimental conditions have similar topological features, despite their small overlap of the individual PPIs. To our knowledge this has not yet been shown so clearly and in such an unbiased way. Moreover, we could see very clearly, upon comparing networks sampled with different methods,that the data bias induced by the sampling method presently overshadows species related structural properties. Again, although methodological biases have been acknowledged in the literature, our ability to quantify their impact by using objective distance measures opens a powerful new window on proteome data and their quality control. In this project we seek to add a further essential ingredient to the theory: to include in our macroscopic characterizations of networks the statistics of short loops (beyond quantifying structure only via degree statistics and degree correlations, on which the earlier work was based). The rationale is that functional modules involving a small number of nodes (typically 3-6) appear to play an important role in the overall transduction mechanism. To derive formulae that improve upon those we have used in the previous PPIN comparison, we now need to calculate analytically the Shannon entropy for random graphs with constrained loops. This step is theoretically very difficult, and will involve half of the project duration. If fully exact evaluation is too demanding, we will resort to well-defined and sensible approximations of the loop statistics instead. Numerical simulations will be performed on suitably constructed families of synthetic networks, generated identically or close to those of realistic PPIN. This step will be used as control experiment and/or validation of the developed theory. Finally, the theory will be applied to a large collection of PPIN from different species. The methodological approach developed here should aid experimentalists in the design and interpretation of future studies.

Technical Summary

An important question in systems biology is how to quantify optimally and systematically the macroscopic structure of protein-protein interaction networks (PPIN). Its answer will allow us to study more systematically the relation between topology and biological functionality in such signalling networks. We have previously developed a novel information-theoretic framework for the generation of such measures, and proven their applicability to PPIN. The results showed for the first time that PPI datasets from the same species that exhibited only a small fraction of common interactions (and were therefore viewed with suspicion by the experimental community) in fact share remarkably similar macroscopic topological properties. This study, however, is limited in that the topological information on which it is based does not include network loops, in spite of their known relevance in proteomic modules. In this project we seek to develop more informative information-theoretic distance measures for comparing PPIN networks, to serve as powerful practical tools for characterizing and comparing networks within and between species and for mapping methodological biases. These new measures will additionally take into account the statistics of short loops in PPIN. At a mathematical level this changes the problem from calculating the Shannon entropy for ensembles of effectively tree-like graphs to calculating the Shannon entropy for ensembles of graphs with constrained short loops (either exactly of in controlled approximation), which is HIGHLY nontrivial and requires qualitatively different techniques. Once developed the theoretical framework, this will be applied to the continuously growing PPIN for different species available in the public domain, the analysis should highlight more subtle differences between the graphs with respect to conserved functional modules. Our results should inform and guide new experiments aimed at reducing the existing biases.

Planned Impact

Impact on research - mathematical methodology Available proteome data are still far from complete and of limited reproducibility. In order to progress in this domain, new mathematical tools based on rigorous formulas are needed for the accurate comparison, evaluation and analysis of these data. Our first direct scientific impact is to prepare the way towards comparative proteomics, by generating these required new and advanced mathematical tools, prove the usefulness of their application to real data, and make them available to the scientific community. Impact on the promotion of systems biology Our scientific approach is intrinsically integrative, relying completely on combining effectively cross-disciplinary expertise in mathematics, bioinformatics, and biology. Our project can impact positively on the awareness in the scientific community of the the feasibility, potential and effectiveness of systems biology research consortia. We will actively reinforce this message by presenting our work at conferences explicitly as the result of a successful systems biology team effort, and we will encourage and assist others in forming efficient multi-disciplinary systems biology teams. Impact on health and well-being The development of computational tools with a rigorous mathematical foundation, that are designed and able to compare unambiguously interactome data from different sources, will increase the quality of our analysis of patient data and the precision with which novel drug targets can be predicted, and thereby accelerate the personalized medicine agenda. Impact on people - teaching and transfer of knowledge King's College is one of the UK's leading academic institutions, in research and in teaching. The post-doctoral researcher of the project will have to analyze and understand protein-protein interaction data, and at the same time master the developed mathematical tools anode the algorithms in user-friendly programs. Our project thus increases the number of biomedical scientists that can work across discipline boundaries. In fact, the two applicants are the main driving forces behind most multi-disciplinary systems biology initiatives at King's College. Impact on the UK's competitiveness - profile and networking The combined dissemination of our novel methodology, via publication in internationally recognized journals, presentation at international conferences, and the creation of accompanying e-tools, together with our activities in promoting their application in medicine, will contribute to raising the UK's profile as a leader in the fields of bioinformatics and translational research. Outreach activities Dissemination via publication in journals, presentation at conferences, and the generation of {\tt e-tools}, would be carried out mainly by the postdoc (guided by the applicants). The translational dissemination towards medicine would be done by the applicants, via their existing commitments in other projects; instead of demanding resources, this increases the effectiveness of other biomedical research. The impact via training (the systems biology teaching activities) and the formation of international networks would also be done by the applicants. They are already committed to stimulating systems biology research and the injection of advanced mathematical methods into the interface between applied mathematics and biomedicine, the present application would make it easier for them to continue doing so.


10 25 50
publication icon
Buffa P (2014) BCR-ABL residues interacting with ponatinib are critical to preserve the tumorigenic potential of the oncoprotein. in FASEB journal : official publication of the Federation of American Societies for Experimental Biology

publication icon
Fruhwirth GO (2011) How Förster resonance energy transfer imaging improves the understanding of protein interaction networks in cancer biology. in Chemphyschem : a European journal of chemical physics and physical chemistry

publication icon
Kleinjung J (2014) Design and application of implicit solvent models in biomolecular simulations. in Current opinion in structural biology

Description Interactome comparisons have highlighted conserved modules, that might represent common functional cores of ancestral origin. However, recent analyses of protein-protein interaction networks (PPINs) resulted in a debate about the influence of the experimental method on the quality and biological relevance of the interaction data [1,2]. It is crucial to know to what extent discrepancies between different species networks reflect sampling biases of the relative experimental methods, as opposed to topological features due to biological functionality. This requires new, precise and practical mathematical tools with which to quantify and compare the topological structures of networks macroscopically. To this end we started to study the relationship between structured random graph ensembles and real biological signaling networks focusing on the number of short loops in networks which represent complexes in PPINs. In this contribution we present a large scale investigation of the role of loops of length 3, 4, 5 and 6 in 28 PPINs from different species. By combination of a method for graph dynamics and an algorithm for loop counting we estimated the relative importance of loops in biological networks compared to random graphs. We found that loops are a predominant feature of PPINs suggesting that enrichment in their number has a key functional role. We also investigated the abundance of disease-related proteins in short loops.
Proteins in short loops share a common function (functional consensus) and are enriched in particular biological processes
such as mRNA metabolism, localization and the cell cycle.

[1] Annibale A et al, Journal of Physics A: Math. Theor. 42:485001 (25 pp), 2009
[2] Fernandes LP et al, PLoS ONE 5:e12083, 2010
[3] Lu HC, Fornili A, Fraternali F. Protein-protein interaction networks studies and importance of 3D structure knowledge. Expert Rev Proteomics. 2013 Dec;10(6):511-20.
[4] Chung, SS., et al., Bridging topological and functional information in protein interaction networks by short loops
profiling. Scientific reports, under review.
Exploitation Route We are in the process of completing a web-site dedicated to the project. Particularly we will deposit all the functional annotation of the calculated loops. This will be particularly useful for scientists wanting to design siRNA targeted screens.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare

Description Crick Sabbatical attachment 
Organisation Francis Crick Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Our group has worked on assembling and interpreting "systems" data by combining protein-protein network topology features and molecular structural analyses with experimental validations.We have developed methods for the analysis of allosteric networks in protein conformations of the enzyme PKM2 and designed experiments to test our prediction in the laboratory of Dr. Dimitrios Anastasiou
Collaborator Contribution The laboratory of Dr. Anastasiou has performed a series of Biophysical experiments and in cell experiments to characterise the enzymatic activity of PKM2 and test our designed mutants to measure the effect on the allosteric activity of the protein.
Impact Multidisciplinary activity: Bioinformatics Biophysics Metabolic Enzymatic characterization Cancer metabolism Mass Spectroscopy. Related publication so far:An engineered photoswitchable mammalian pyruvate kinase Gehrig, S., Macpherson, J. A., Driscoll, P. C., Symon, A., Martin, S. R., MacRae, J. I., Kleinjung, J., Fraternali, F. & Anastasiou, D. 1 Sep 2017 In : The FEBS journal. 284, 18, p. 2955-2980
Start Year 2014