Inference-based Modelling in Population and Systems Biology

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Increasing amounts of biological data are being generated and collected which describe the change of biological systems over time. In systems biology, for instance, it is now normal practice to screen the interactions among a large number of molecules using automated techniques. To interpret such data we are more and more reliant on mathematical models. Such models summarise the way we think biological systems work. Often, we do not know with certainty how biological systems work and what mechanisms operate, and there are often many different models that could describe a given biological system. To find out which model is best, or which mechanism is most likely, one needs to collect data and compare the output of the models with the data. We propose to develop techniques to carry out such an analysis to select models and make conclusions about biological systems. Here we will use concepts from the theory of dynamical systems and statistical inference, combine them in novel ways and develop them for the analysis biological systems in ecology and systems biology, respectively. We will then apply these techniques to different biological questions. The mathematical models and the tools needed to do this are very similar in population biology and in systems biology, and we have therefore selected a mixture of applications form population biology and systems biology. The art to compare different mathematical models in describing data from biological systems and processes is thus of utmost importance for the future development of the modern life- and biomedical sciences. This problem has been studied and practiced before by many others, but the present study introduces a novel element to this field. A model normally consists of two parts: it has a mathematical structure, which specifies which parts of a system interact; and secondly, it has a set of variables, which specify how much the various parts interact (called the model parameters, e.g. kinetic rate constants). The model structure is often 'guessed' or hypothesized, and these hypotheses tested by performing experiments; the model parameters are often inferred from experimental data but some model parameters can be very hard to estimate. While it had previously been thought that not being able to estimate the parameters with certainty makes the analysis of biological processes difficult, if not impossible, recent research - including research done by the three groups that propose to do this research - has shown that substantial progress can be made even without knowing this. This is because (i) if a parameter is hard to estimate, it is often because it has little impact on how the system works, and (ii) by integrating over all the possible parameters of parameters that are not known with certainty one can get a very good understanding of how the system works. Even when such approaches do not yield definitive answers as to how biological systems work, they can help us to make design better experiments or point to data that ought to be collected in order to be most informative. The statistical tools that will be developed during the course of this project will be applied to datasets from a diverse range of biological systems. Together with experimental research collaborators we will explore how well these novel techniques work, and explore the new insights that we hope to get by using such techniques. The biological systems that we will study are: plankton in freshwater lakes, mechanisms by which bacteria cope with their environment, two different sets of interacting molecules, which transmit signals through cells, energy production during infection of barley by powdery mildew, and the ecosystem of algae, midges and fish in a lake in Iceland. These different biological systems will help us to fine tune the statistical techniques, suggest how to make the best use of biological data, and thus improve our understanding of how nature works.

Technical Summary

Modelling of biological systems is complicated by (i) uncertainty as to which mathematical model offers the best description of the process or system under investigation, and (ii) a lack of suitable and reliable parameter estimates for those models that exist in the literature. In this research project we will develop tools that can analyze biological systems, which will be chosen from ongoing research in population biology and systems biology, in cases where data is sparse, the model structure uncertain, and parameters largely unknown. The approaches we propose to develop will address problems related to model selection, parameter estimation, and qualitative and sensitivity analysis. The three complementary lines of research that make up this research project will enable us to use qualitative and quantitative aspects of biological systems in order to compare the explanatory and predictive power of different mathematical models of biological systems processes. Several recent studies have shown that only a subset of model parameters can be inferred precisely from finite amounts of data. For this reason it is crucial to use inferential procedures that also provide confidence measures rather than merely point estimates. From an experimentalist's point of view it is important to know this information as parameters with large confidence intervals will also be hard to estimate experimentally. Employing Bayesian settings will allow us to formally integrate over unknown parameters in order to choose which model best explains the data. We will develop these methods in the context of a set of exemplar projects. This will allow us to tailor the methodology to real-world problems in population and systems biology. A comprehensive analysis of these models - using simulated data in addition to real data were available - will allow us to test our approaches on a large scale. More importantly, however, it will allow us to study notions of sloppiness in more detail.

Publications

10 25 50
publication icon
Barnes C (2011) Bayesian design strategies for synthetic biology in Interface Focus

publication icon
Barnes CP (2011) Bayesian design of synthetic biological systems. in Proceedings of the National Academy of Sciences of the United States of America

publication icon
Filippi S (2013) On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. in Statistical applications in genetics and molecular biology

publication icon
Johnson R (2015) SYSBIONS: nested sampling for systems biology. in Bioinformatics (Oxford, England)

publication icon
Kirk P (2013) Model selection in systems and synthetic biology. in Current opinion in biotechnology

publication icon
Liepe J (2013) Maximizing the information content of experiments in systems biology. in PLoS computational biology

publication icon
Petrovskii S (2011) Variation in individual walking behavior creates the impression of a Levy flight. in Proceedings of the National Academy of Sciences of the United States of America

publication icon
Silk D (2013) Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems. in Statistical applications in genetics and molecular biology

publication icon
Silk D (2014) Model selection in systems biology depends on experimental design. in PLoS computational biology

publication icon
Stumpf MP (2014) Approximate Bayesian inference for complex ecosystems. in F1000prime reports

publication icon
Zhou Y (2011) GPU accelerated biochemical network simulation. in Bioinformatics (Oxford, England)

publication icon
Žurauskiene J (2014) Bayesian non-parametric approaches to reconstructing oscillatory systems and the Nyquist limit in Physica A: Statistical Mechanics and its Applications

 
Description We were able to develop a new, powerful statistical framework for the use in systems and synthetic biology
Exploitation Route Many people are using our software packages in computational biology (>500 citations)

The software is widely used even beyond the lifescience community and is attracting attention also from the commercial sector, in particular to model the behaviour of complex societies (e.g. agent-based modelling of individuals in cities, airports etc) and the simulation of infrastructure projects prior to implementation.
Sectors Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Government, Democracy and Justice,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy,Transport

 
Description Development of software and analysis of different signal transduction and gene regulation systems.
First Year Of Impact 2011
Sector Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title ABC SysBio 
Description Parameter Inference and Systems Biology 
Type Of Technology Software 
Year Produced 2009 
Impact 350+ citations and widespread use 
 
Title InformationMeasures.jl 
Description A Julia package to infer gene regulatory networks using information theoretical approaches. 
Type Of Technology Software 
Year Produced 2016 
Impact This is a very fast (up to 500 times faster than current R packages) and accurate means of applying bi- and multi-variate information theoretical measures. 
URL https://github.com/Tchanders/InformationMeasures.jl