Inference-based Modelling in Population and Systems Biology

Lead Research Organisation: Royal Holloway, University of London
Department Name: Biological Sciences


Increasing amounts of biological data are being generated and collected which describe the change of biological systems over time. In systems biology, for instance, it is now normal practice to screen the interactions among a large number of molecules using automated techniques. To interpret such data we are more and more reliant on mathematical models. Such models summarise the way we think biological systems work. Often, we do not know with certainty how biological systems work and what mechanisms operate, and there are often many different models that could describe a given biological system. To find out which model is best, or which mechanism is most likely, one needs to collect data and compare the output of the models with the data. We propose to develop techniques to carry out such an analysis to select models and make conclusions about biological systems. Here we will use concepts from the theory of dynamical systems and statistical inference, combine them in novel ways and develop them for the analysis biological systems in ecology and systems biology, respectively. We will then apply these techniques to different biological questions. The mathematical models and the tools needed to do this are very similar in population biology and in systems biology, and we have therefore selected a mixture of applications form population biology and systems biology. The art to compare different mathematical models in describing data from biological systems and processes is thus of utmost importance for the future development of the modern life- and biomedical sciences. This problem has been studied and practiced before by many others, but the present study introduces a novel element to this field. A model normally consists of two parts: it has a mathematical structure, which specifies which parts of a system interact; and secondly, it has a set of variables, which specify how much the various parts interact (called the model parameters, e.g. kinetic rate constants). The model structure is often 'guessed' or hypothesized, and these hypotheses tested by performing experiments; the model parameters are often inferred from experimental data but some model parameters can be very hard to estimate. While it had previously been thought that not being able to estimate the parameters with certainty makes the analysis of biological processes difficult, if not impossible, recent research - including research done by the three groups that propose to do this research - has shown that substantial progress can be made even without knowing this. This is because (i) if a parameter is hard to estimate, it is often because it has little impact on how the system works, and (ii) by integrating over all the possible parameters of parameters that are not known with certainty one can get a very good understanding of how the system works. Even when such approaches do not yield definitive answers as to how biological systems work, they can help us to make design better experiments or point to data that ought to be collected in order to be most informative. The statistical tools that will be developed during the course of this project will be applied to datasets from a diverse range of biological systems. Together with experimental research collaborators we will explore how well these novel techniques work, and explore the new insights that we hope to get by using such techniques. The biological systems that we will study are: plankton in freshwater lakes, mechanisms by which bacteria cope with their environment, two different sets of interacting molecules, which transmit signals through cells, energy production during infection of barley by powdery mildew, and the ecosystem of algae, midges and fish in a lake in Iceland. These different biological systems will help us to fine tune the statistical techniques, suggest how to make the best use of biological data, and thus improve our understanding of how nature works.

Technical Summary

Modelling of biological systems is complicated by (i) uncertainty as to which mathematical model offers the best description of the process or system under investigation, and (ii) a lack of suitable and reliable parameter estimates for those models that exist in the literature. In this research project we will develop tools that can analyze biological systems, which will be chosen from ongoing research in population biology and systems biology, in cases where data is sparse, the model structure uncertain, and parameters largely unknown. The approaches we propose to develop will address problems related to model selection, parameter estimation, and qualitative and sensitivity analysis. The three complementary lines of research that make up this research project will enable us to use qualitative and quantitative aspects of biological systems in order to compare the explanatory and predictive power of different mathematical models of biological systems processes. Several recent studies have shown that only a subset of model parameters can be inferred precisely from finite amounts of data. For this reason it is crucial to use inferential procedures that also provide confidence measures rather than merely point estimates. From an experimentalist's point of view it is important to know this information as parameters with large confidence intervals will also be hard to estimate experimentally. Employing Bayesian settings will allow us to formally integrate over unknown parameters in order to choose which model best explains the data. We will develop these methods in the context of a set of exemplar projects. This will allow us to tailor the methodology to real-world problems in population and systems biology. A comprehensive analysis of these models - using simulated data in addition to real data were available - will allow us to test our approaches on a large scale. More importantly, however, it will allow us to study notions of sloppiness in more detail.


10 25 50
Description To understand and interpret data, scientists use models. Models can be simple or highly complex, but all are simplified representations of reality. Models are therefore necessarily imperfect, but should capture the important features of the reality we study, much like a cartoon captures the typical features of the situation or person pictured. The models we used offer cartoons of biological processes.
The work we did was based on the idea that it ought to be possible to develop a methodology (i) to choose from a set of candidate models which one offers the best description of the biological process, and (ii) to use the models chosen to infer how such systems work, and what mechanisms could create the process.

To choose the best model, the model results are compared with the data from the real process, it is then quantified how well the model describes the data. By comparing various models the one that offers the best description can be chosen, and if different mechanisms are included, the most likely mechanism can be inferred. In terms of the cartoon analogy, by comparing various cartoons with the person they should depict, it can be decided which cartoon is best.

We applied this idea to the movement of animals. It has been observed that many animals move in a peculiar pattern, often moving very little and sometimes moving a lot, a pattern called a Lévy flight. What is still much under debate is what could cause this pattern. Whereas some people argue that this pattern is somehow built in to animal behaviour, others have argued that is comes about because some animals move a lot, and other individuals move very little. This is important, for instance, for designing nature reserves and to predict pest outbreaks and understand the spread of diseases.

We used video records of aphids and mussels to find out how they move. These animals normally move for a while and then sit still, and we recorded the different times over which they moved. We developed a methodology for the analysis of this data. We did this by creating two different models, as outlined above, and developed a statistical test to decide which model is best. We discovered that neither of these animals moved according to a Lévy flight but that they wander about randomly, much like inanimate molecules move. However, some animals tend to wander much more than others. For the mussels, for instance, we discovered that they either sit still, move a little (e.g. by moving their shells), or really move some distance. Our methodology allowed us to discern between the two different mechanisms. In this way we were able, not only to contribute to this debate, but to offer a methodology to tell the difference between the two modes of movement. Understanding individual variation is crucial for interpreting the collective movement patterns of animals. This research will open the way to better understand animal search and behaviour and work out how it has evolved

One of the other biological process we investigated was the dynamics of white blood cells in the human body. How the number of white blood cells is regulated in the body is not entirely understood. One way to find out how it works is by looking at the differences between how these cells are regulated in healthy people, or people with disease. From immunologists we worked with we received data how white blood cells are replaced in the human body for healthy people and for people with HIV. The analysis of these data could give information of how fast white blood cells are produced, but it is difficult to extract this information in a precise way from the data. We developed a methodology of how to do this.

The work in this project has allowed us to develop statistical methodologies that can be used by biologists. We have done this for the above examples and, inspired by this project are currently trying to develop statistical methodologies for other areas of biology.
Exploitation Route The methodology we have developed can, and has been used by others. This could be used for the analysis of movement data of animals and humans.
Sectors Other

Description The methodology we have developed has been used by others to analyse movement data of animals. One application has been the analysis of grazers in a game reserve. We are currently, building on the results of this projects, researching how best to describe the dynamics of bacterial populations, with a view to application of the results in the food industry.
First Year Of Impact 2013
Sector Agriculture, Food and Drink,Other