Predicting properties of biological networks from noisy and incomplete data

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Networks aim to put interactions and dependencies among different objects (or agents) into a single coherent context. Their analysis has attracted great attention in different scientific disciplines because they offer a pictorial representation of complex phenomena, and they frequently also allow a detailed mathematical analysis of these phenomena. Unfortunately, observed networks are often very different from the true network because we cannot measure all interactions reliably. Moreover frequently only some small part of the network is considered. Both factors affect our ability to interpret network data reliably. This is especially true for many biological network datasets. The applicants group has developed a range of mathematical tools that allow us to study the effects these sources or error have on our analysis, and to overcome the limitations imposed by them to some extent. In the proposed research we will adapt these mathematical methods so that they can be applied to biological networks, in particular protein-interaction network data. This will involve the formulation of detailed models of the different experimental methods used to obtain protein interaction data. By simulating the experiment we can study the effects (and causes) of error in detail and use this to gain insights into the reliability of different datasets. With this better understanding of the effects of noise and incompleteness on experimental datasets we can then try to predict properties of the true (but partially unobserved) network. We will use this to predict the size of interaction network in different species: it is now known that the number of genes does not correlate well with our understanding of the relative complexity of different organisms (for example the number of human genes is less than twice the number of genes in the fruitfly). The statistical prediction procedures to be developed in the course of the proposed research will allow us to infer the sizes of the interaction networks in different species and will therefore enable us to see if the complexity of the network could help to explain the differences in biological complexity between different species. Finally, we will study new and more realistic models for protein interaction networks.

Technical Summary

Biological networks have received great attention in the context of systems biology but the poor quality of the data and the fact that networks are far from complete poses severe limitations on their usefulness in their present state. This project will build on recent theoretical developments in random graph theory and statistics to overcome these limitations. Simulation models for the different experimental procedures used to map out protein-protein interactions will allow us to understand the causes of noise and its effect on analyses of network data. With this better understanding we will then be able to generate predictive models, which allow us to infer properties of the true network from partial and incomplete network data. In particular we will use multi-model inference in likelihood and Bayesian settings in order to infer properties of the true network from partial data. We have already been able to demonstrate the usefulness of such approaches in pilot studies. These tools will then be applied to real-world biological networks in order to predict structural and functional properties of the global protein interaction networks in a range of species for which suitable data is available. Finally we will explore the scope of more realistic network models which allow for changes in the network structure (with time or in response to some stimulus) as well as different interaction strengths.

Publications

10 25 50

publication icon
Kelly W (2008) Protein-protein interactions: from global to local analyses. in Current opinion in biotechnology

publication icon
Kelly WP (2012) Assessing coverage of protein interaction data using capture-recapture models. in Bulletin of mathematical biology

publication icon
Kelly WP (2012) The degree distribution of networks: statistical model selection. in Methods in molecular biology (Clifton, N.J.)

publication icon
Stumpf MP (2010) Incomplete and noisy network data as a percolation process. in Journal of the Royal Society, Interface

publication icon
Stumpf MP (2008) Estimating the size of the human interactome. in Proceedings of the National Academy of Sciences of the United States of America

publication icon
Thorne T (2007) Generating confidence intervals on biological networks. in BMC bioinformatics

 
Description We have shown that it is possible to reconstruct network properties reliably from very noisy and incomplete data.
Exploitation Route We are using this in collaboration with financial regulators and risk consultancies
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Energy,Government, Democracy and Justice,Transport

 
Description We have engaged with Banking regulators and are informing them about how graph theory and network analysis can help their mission
First Year Of Impact 2010
Sector Agriculture, Food and Drink,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Other
Impact Types Economic,Policy & public services