Bayesian evidence analysis tools for systems biology

Lead Research Organisation: University of Exeter
Department Name: Engineering Computer Science and Maths

Abstract

The study of biological systems, from cells, to organisms and populations, is becoming increasingly quantitative. Even at the level of a single cell, molecular biologists and geneticists are able to measure amounts of molecules such as proteins and RNAs, and to begin to unravel the connections between molecules that make up the pathways and processes that keep the cell functioning. Our knowledge of the interaction of molecules and genes comes from many sources. These include studies of the three dimensional structure of proteins, from which their function can be inferred, through to in vitro and in vivo studies that show how genes, and the molecules that switch them on and off, interact in the test tube, and in a key single cell organism such as yeast, or higher plant such as Arabidopsis thaliana. The way that molecular systems are described is changing from the traditional diagrammatic sketch of likely interactions, to a set of mathematical equations linking the rates of change of one molecule with the amounts of others. When the number of molecules is small, a set of stochastic reactions becomes a more accurate representation than a set of ordinary differential equations. But in both cases, finding the best fit between a mathematical model and data from the laboratory becomes a major problem. A second important issue concerns the justification for decisions made in modelling a biological system. We might like to say that only one model describes the data - but this is not possible for any complex system. Instead, we can hope to show that one model fits the data better than another, and this is the aim of the research proposed here. We shall apply a probabilistic approach that can optimise the fit of models to data, and quantitatively compare the extent to which they fit the data. This will provide useful information to the bench biologists and the systems biologists with whom they collaborate to further our knowledge of the cell.

Technical Summary

This project will address the problems of optimising and comparing stochastic systems biology models by applying the nested sampling algorithm (Skilling, 2006) that computes the Bayesian evidence. These functions will be delivered to users by incorporating them in a new version of the popular stochastic simulation tool Dizzy (Ramsey et al, 2005). The nested sampling algorithms will also be released as R and Matlab packages. By comparing the total evidence in favour of each alternative model of a biological system (measured in decibans or in bits), systems biologists will be able to evaluate alternative modelling decisions, and to compare alternative stochastic models, in the light of the experimental data. This will be achieved by integrating over all plausible parameter values to estimate the Bayesian evidence. The tool will also provide the modeller with an analysis of samples drawn from the posterior distribution of parameter values that is generated by nested sampling. Multiple modes in the distribution of a parameter, and correlations between parameters will be automatically identified by regression and clustering: these are of great interest to systems modellers and will generate novel insights into the biological models and data under investigation. Algorithms for intelligently managing the optimisation procedure for the user will be provided, including methods to terminate the run when the most informative samples have been located, and methods to detect when the user has selected inappropriate values for the optimiser. These features will assist the uptake of the new tools. The new tools will be immediately useful to Dizzy users, who will be able to optimise models to fit experimental data, and compare models, with minimal configuration of the optimisation algorithm. R and Matlab users will be able to run the optimisation in conjunction with the simulators and other modules provided by those environments.

Planned Impact

This project is in the strategic research priority area of systems approaches to biological research. A novel computational tool will be developed for systems biology modelling. Synthetic biology is also a research priority and the tool will be immediately applicable to models of synthetic systems. Who will benefit? We have identified the immediate beneficiaries of the software to be produced. These are the large number of systems biologists and synthetic biologists studying and working in the UK. We identify two user groups within this community: those primarily interested in the model and the underlying biology, and those who (additionally) require the sophisticated mathematical and statistical packages available for R and Matlab to investigate properties of models. The work of the immediate beneficiaries will impact on several key targets identified by the research council. The improvement of food production (Crop science) can benefit from systems models, e.g. by increasing our understanding of the circadian clock, as will be studied here. Similarly, the development of genetically modified or synthetic organisms for tackling pollution or generating energy (Bioenergy) can also benefit from mathematical models of the system and its parts. Research in these areas is readily exploited in the biotechnology, agriculture and pharmaceutical sectors of the economy. How will they benefit? Systems biologists who are more concerned with a particular model, and its fit to the available wet lab data, will benefit from the new version of the Dizzy simulation tool that this project will produce. This tool will provide an easy-to-use interface that allows models to be optimised and the Bayesian evidence computed. The optimal parameter values, and their standard deviation (estimated from the posterior distribution), along with the evidence value (in bits or decibans) will aid their research. Those systems and synthetic biologists who are concerned with the properties of models, or the modelling process, will have access to the algorithms at the programmatic level, and to the code itself. Those who have benefited indirectly, i.e. through the earlier use of systems modelling in the development of a modified organism or the identification of a drug target, will be able to rely on the mathematical analysis of the evidence that supports the use of the model. The tool will contribute to the base of evidence upon which decisions can be made. What will be done to ensure they benefit? The software to be developed will be made available on an open source basis. We shall publish the scientific results as widely as possible, and in open source journals.

Publications

10 25 50
 
Description We have published novel results on the application of nested sampling, a statistical approach to computing the Bayesian evidence Z, to the inference of model parameters, and the estimation of log Z in an exemplar model of circadian rhythms. We have shown a ten-fold difference in the value of the coefficient of variation between degradation and transcription parameters, highlighting the utility of this summary statistic for discriminating between highly constrained and less well constrained parameters. We have further demonstrated that the estimates of posterior parameter densities (as summarised by parameter means and standard deviations) are influenced predominately by the length of the timeseries, becoming more narrowly constrained as the number of circadian cycles considered increases.



Sampling the data more frequently, however, does not significantly reduce the uncertainty remaining in the parameter values.

Novel algorithms for calculating the likelihood of a model, and a characterisation of the performance of the nested sampling algorithm are also reported. The methods we have developed considerably improve the computational efficiency of the likelihood calculation, and of the exploratory step within nested sampling.
Exploitation Route Out findings can be put to use through the open source software we have developed.
Sectors Digital/Communication/Information Technologies (including Software),Environment,Pharmaceuticals and Medical Biotechnology

 
Title Dizzy Beats Java tool: http://sourceforge.net/projects/bayesevidence/ 
Description Software package that provides model simulators from the original Dizzy tool, and adds an optimiser and the nested sampling algorithm in an easy to use application. 
Type Of Technology Software 
Year Produced 2011 
Impact No actual Impacts realised to date 
URL http://sourceforge.net/projects/bayesevidence
 
Description Project website: 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Public website giving overview of project and outputs.

no actual impacts realised to date
Year(s) Of Engagement Activity 2013