Bayesian evidence analysis tools for systems biology

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

The study of biological systems, from cells, to organisms and populations, is becoming increasingly quantitative. Even at the level of a single cell, molecular biologists and geneticists are able to measure amounts of molecules such as proteins and RNAs, and to begin to unravel the connections between molecules that make up the pathways and processes that keep the cell functioning. Our knowledge of the interaction of molecules and genes comes from many sources. These include studies of the three dimensional structure of proteins, from which their function can be inferred, through to in vitro and in vivo studies that show how genes, and the molecules that switch them on and off, interact in the test tube, and in a key single cell organism such as yeast, or higher plant such as Arabidopsis thaliana. The way that molecular systems are described is changing from the traditional diagrammatic sketch of likely interactions, to a set of mathematical equations linking the rates of change of one molecule with the amounts of others. When the number of molecules is small, a set of stochastic reactions becomes a more accurate representation than a set of ordinary differential equations. But in both cases, finding the best fit between a mathematical model and data from the laboratory becomes a major problem. A second important issue concerns the justification for decisions made in modelling a biological system. We might like to say that only one model describes the data - but this is not possible for any complex system. Instead, we can hope to show that one model fits the data better than another, and this is the aim of the research proposed here. We shall apply a probabilistic approach that can optimise the fit of models to data, and quantitatively compare the extent to which they fit the data. This will provide useful information to the bench biologists and the systems biologists with whom they collaborate to further our knowledge of the cell.

Technical Summary

This project will address the problems of optimising and comparing stochastic systems biology models by applying the nested sampling algorithm (Skilling, 2006) that computes the Bayesian evidence. These functions will be delivered to users by incorporating them in a new version of the popular stochastic simulation tool Dizzy (Ramsey et al, 2005). The nested sampling algorithms will also be released as R and Matlab packages. By comparing the total evidence in favour of each alternative model of a biological system (measured in decibans or in bits), systems biologists will be able to evaluate alternative modelling decisions, and to compare alternative stochastic models, in the light of the experimental data. This will be achieved by integrating over all plausible parameter values to estimate the Bayesian evidence. The tool will also provide the modeller with an analysis of samples drawn from the posterior distribution of parameter values that is generated by nested sampling. Multiple modes in the distribution of a parameter, and correlations between parameters will be automatically identified by regression and clustering: these are of great interest to systems modellers and will generate novel insights into the biological models and data under investigation. Algorithms for intelligently managing the optimisation procedure for the user will be provided, including methods to terminate the run when the most informative samples have been located, and methods to detect when the user has selected inappropriate values for the optimiser. These features will assist the uptake of the new tools. The new tools will be immediately useful to Dizzy users, who will be able to optimise models to fit experimental data, and compare models, with minimal configuration of the optimisation algorithm. R and Matlab users will be able to run the optimisation in conjunction with the simulators and other modules provided by those environments.

Planned Impact

This project is in the strategic research priority area of systems approaches to biological research. A novel computational tool will be developed for systems biology modelling. Synthetic biology is also a research priority and the tool will be immediately applicable to models of synthetic systems. Who will benefit? We have identified the immediate beneficiaries of the software to be produced. These are the large number of systems biologists and synthetic biologists studying and working in the UK. We identify two user groups within this community: those primarily interested in the model and the underlying biology, and those who (additionally) require the sophisticated mathematical and statistical packages available for R and Matlab to investigate properties of models. The work of the immediate beneficiaries will impact on several key targets identified by the research council. The improvement of food production (Crop science) can benefit from systems models, e.g. by increasing our understanding of the circadian clock, as will be studied here. Similarly, the development of genetically modified or synthetic organisms for tackling pollution or generating energy (Bioenergy) can also benefit from mathematical models of the system and its parts. Research in these areas is readily exploited in the biotechnology, agriculture and pharmaceutical sectors of the economy. How will they benefit? Systems biologists who are more concerned with a particular model, and its fit to the available wet lab data, will benefit from the new version of the Dizzy simulation tool that this project will produce. This tool will provide an easy-to-use interface that allows models to be optimised and the Bayesian evidence computed. The optimal parameter values, and their standard deviation (estimated from the posterior distribution), along with the evidence value (in bits or decibans) will aid their research. Those systems and synthetic biologists who are concerned with the properties of models, or the modelling process, will have access to the algorithms at the programmatic level, and to the code itself. Those who have benefited indirectly, i.e. through the earlier use of systems modelling in the development of a modified organism or the identification of a drug target, will be able to rely on the mathematical analysis of the evidence that supports the use of the model. The tool will contribute to the base of evidence upon which decisions can be made. What will be done to ensure they benefit? The software to be developed will be made available on an open source basis. We shall publish the scientific results as widely as possible, and in open source journals.

Publications

10 25 50
publication icon
Kilpatrick AM (2013) MCOIN: a novel heuristic for determining transcription factor binding site motif width. in Algorithms for molecular biology : AMB

publication icon
Kilpatrick AM (2014) Stochastic EM-based TFBS motif discovery with MITSU. in Bioinformatics (Oxford, England)

publication icon
Aitken S (2015) Dizzy-Beats: a Bayesian evidence analysis tool for systems biology. in Bioinformatics (Oxford, England)

 
Description We have developed and evaluated a computational statistical technique for comparing alternative mathematical models through their fit to data. Mathematical modelling is important in many areas, including systems biology and systems medicine. However, often it is not possible to write down a single model - alternative mechanisms and models need to be compared and this generates a highly complex problem in its own right. We implemented a number of algorithms in the R programming language that make the Nested Sampling technique practical in many applications, and generated new insights into a mathematical model of circadian rhythms. The resulting papers and algorithms have been made publicly available.
Exploitation Route The technique has been adopted more widely since we started this research, and is being developed further by the authors.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.aiai.ed.ac.uk/project/bayesevidence/
 
Description This project has contributed an important method for the comparison of mathematical models through their fit to data. The technique has been recognised by others in the field, and the authors continue to develop and apply the method, for example, to characterise genome-wide data (Aitken et al, PLOS Comp Biol; 2015 Transcriptional Dynamics Reveal Critical Roles for Non-coding RNAs in the Immediate-Early Response).
Sector Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic

 
Title Dizzy-Beats 
Description Dizzy-Beats is a tool to infer the parameters of systems biology models (i.e. to compute their mean and standard deviation) and compare models using the nested sampling algorithm. The approach is readily applicable to any probabilistic modelling technique where a likelihood function can be defined. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact The technique has been used to gain insight into a model of circadian rhythms and is under continuous development. 
URL http://www.aiai.ed.ac.uk/project/bayesevidence/tools.html