Computational inference of biopathway dynamics and structures

Lead Research Organisation: University of Glasgow
Department Name: School of Mathematics & Statistics

Abstract

The mathematical modelling of regulatory interactions and signalling processes in living cells is a growing research area, aiming to elucidate the molecular mechanisms that give rise to complex biological phenomena. Examples include circadian clock models describing how plants predict the length of night and adjust their metabolism to prevent carbon starvation before dawn, or carcinogenesis models aiming to explain how aberrant cellular signalling pathways lead to tumour growth and metastasis. Ambitious current approaches in systems biology aim to develop mechanistic models of the relevant cellular networks, using methods from chemical kinetics and control theory. However, due to the large number of chemical kinetic parameters, inference remains an extremely challenging problem, restricting current applications to small a priori identified model pathways. The objective of the proposed research project is to advance the current state of the art of computational/statistical inference in mechanistic models, with a particular focus on applications in systems biology. To this end, we aim to explore and hone a portfolio of methods combining ideas based on (i) gradient matching, (ii) modularization, (iii) parallel tempering, and (iv) focus statistics. The idea of gradient matching (i) is to avoid a computationally expensive explicit solution of the differential equations and instead infer kinetic parameters that give the best agreement between the gradients predicted from the differential equations and those obtained from the tangent to the interpolant of the data. The aim of modularization (ii) is to decompose a complex system into a collection of simpler weakly coupled subsystems for which inference is less challenging, and then reduce inconsistencies between the subsystems in an iterative manner. Parallel tempering (iii) proceeds by carrying out inference on a set of several increasingly smoothed (tempered) versions of the mismatch function in parallel, as a way to avoid suboptimal local optima. As an alternative to smoothing, we aim to adopt ideas from approximate Bayesian computation (iv) and replace the data in parallel chains by sets of focus statistics to extract relevant patterns from the data. This mimics heuristic procedures that are currently carried out by biological modellers, who aim to find chemical kinetic parameters that match certain signatures in the data, like phase shifts, frequency variations, amplitude alterations, etc. All approaches, in isolation and in combination, aim to reduce the computational complexity at a high level of accuracy, thereby enabling an application of inference in mechanistic models to larger and more complex systems. For regulatory networks whose structure is a priori unknown, we precede the above procedures with novel structure learning algorithms from machine learning, aiming for fast search through network topology space based on abstract models of molecular interactions. For both structure learning with machine learning methods and kinetic parameter inference for mechanistic models we will exploit modern PC clusters for parallel processing. The results of our research will be implemented in a user-friendly software toolbox that will be integrated into GLAMA, a widely used systems biology and polyomic data analysis software package (http://www.brc.dcs.gla.ac.uk/systems/glama/), for maximal impact in the biological end-user community.

Planned Impact

THE IMMEDIATE TARGET GROUP who will benefit from the proposed research are quantitative biologists working in a variety of disciplines, including (1) pathway medicine, (2) human biomonitoring, (3) molecular plant biology, and (4) ecology.
1) Researchers in PATHWAY MEDICINE will benefit from the deeper understanding of pathogenesis at the molecular level. In particular, the research carried out in the project will lead to a more thorough insight into the consequences of a dysregulation of the pro-inflammatory Interleukin-6 biopathway, which is a potential causal factor in the development of cardiovascular diseases and diabetes.
2) HUMAN BIOMONITORING provides an important source of information on routes of intake of drugs or potentially toxic chemicals in the human body over a wide range of exposure conditions. The mathematical modelling relies on physiologically-based pharmacokinetic or toxicokinetic differential equation models that describe the transfer and perfusion of chemicals between different organs or lumped tissues. The improvement of parameter inference achieved with our work will enable exposure pathways to be identified more accurately. Our work will therefore contribute to improved toxicokinetic screening and risk assessment, to the benefit of the patient community.
3) A challenging problem in MOLECULAR PLANT BIOLOGY is to understand the interplay between internal time keeping (circadian regulation) and metabolism in plants. In the last few years, substantial progress has been made to mathematically model the central processes of circadian regulation at the molecular level. By enabling faster parameter inference and model selection of larger systems, the proposed project will provide the tools for elucidating the detailed structure of the molecular regulatory networks and signalling pathways that enable the interplay between circadian and metabolic processes, to the immediate benefit of the molecular plant biology community. The long-term impact will be a better understanding and control of biomass production in plants, with the ultimate objective to improve food security and biofuel production.
4) ECOLOGICAL SYSTEMS consist of complex interactions among species and their environment, the understanding of which has implications for predicting environmental response to perturbations such as invading species and climate change. However, the revelation of these interactions is not straightforward, nor are the interactions necessarily stable across space and time. By improving the methodology to scale up reliable parameter inference in mechanistic models of diverse species interactions to larger and more complex systems, the proposed project will help ecologists to predict critical regime shifts, i.e. the rapid transition from one stable community structure to another, often ecologically inferior, state. Understanding such regime shifts is particularly critical to assess the impact of climate change on biodiversity, and our work will therefore contribute to the development of more reliable decision support systems on which policy makers can base their risk assessment and adoption of mitigating measures.
A SECOND TARGET GROUP are statisticians, informaticians and quantitative scientists, as our project contributes to the development of improved inference tools for computational statistics and machine learning. This includes nonparametric Bayesian modelling, the modularisation of complex systems, parallel tempering and annealing, emulation versus calibration, mixing of Markov chains, and approximate computation of Bayes factors.
While the primary focus of our project is systems biology, we note that mathematical modelling with differential equations is highly relevant to several other research areas, including chemical engineering, seismology, and epidemiology. The proposed project will therefore make an important methodological contribution that a variety of other scientific disciplines will profit from.

Publications

10 25 50
 
Title ShinyKGode: Gradient Matching and ODE regularisation 
Description This is a tutorial video demonstrating the benefits of ODE regularisation when inferring parameters using gradient matching in the shinyKGode package, described in Joe Wandy Mu Niu Diana Giurghita Rónán Daly Simon Rogers Dirk Husmeier, "ShinyKGode: an Interactive Application for ODE Parameter Inference Using Gradient Matching ", Bioinformatics, https://doi.org/10.1093/bioinformatics/bty089 
Type Of Art Film/Video/Animation 
Year Produced 2018 
Impact The video has just been released, so it's too early to report on impact. 
URL https://www.youtube.com/watch?time_continue=10&v=o5F-UGHH7P4
 
Title shinyKGode: Time Warping 
Description This is a tutorial video demonstrating how time warping can be used to improve parameter estimation in coupled ordinary differential equations with the shinyKGode package, described in Joe Wandy, Mu Niu, Diana Giurghita, Rónán Daly, Simon Rogers and Dirk Husmeier, "ShinyKGode: an Interactive Application for ODE Parameter Inference Using Gradient Matching ", Bioinformatics, https://doi.org/10.1093/bioinformatics/bty089 
Type Of Art Film/Video/Animation 
Year Produced 2018 
Impact This video has just been released; so it's too early to report on impact. 
URL https://www.youtube.com/watch?time_continue=2&v=Wn8nuFwtTJY
 
Description Many traditionally qualitative scientific disciplines have become more quantitative. This paradigm shift is most dramatically witnessed in the life sciences and in health care, and it is based on the methodological framework of differential equations (DEs). DEs describe how the components of a complex system interact and evolve in time. Examples are the time-varying blood pressures and flow rates in different locations of the pulmonary blood vessel network, the time-varying strains and stresses of the pumping heart, or the gene expression patterns in a malignant tumour. DEs provide a mathematical framework to describe these processes. However, the equations typically depend on various kinetic parameters that cannot be directly measured. The challenge, hence, is to learn the parameters indirectly from measurements of the global system, like MRI scans of the heart for cardio-vascular medical diagnostics, or gene expression profiling for molecular pathway identification. In principle this can be achieved with an iterative procedure that aims to systematically minimise the mismatch between measurements and mathematical model predictions. However, the latter require repeated numerical solutions of the DEs, which is computationally onerous. In our research we have developed a method that allows to bypass the explicit numerical solution of the DEs, and in this way the computational complexity can be reduced by several orders of magnitude without adversely affecting the accuracy of inference. Our project is still in progress, and a much more detailed report will be provided at the end of the funding period, when our work has been completed. The idea is to fit a smooth interpolant to the time series of observations or measurements, and then obtain the derivatives of the tangent to this interpolant at different time points. These derivatives are predicted by the differential equations, so we can use an interactive optimisation algorithm to search for the parameters that give the best agreement between these two derivatives.

We found that a practical problem in pursuing this idea is that the interpolant may overfit the training data, particularly when the time series are sparse. In that case, the derivatives obtained from the interpolant are not representative of the underlying dynamical system, leading to poor estimates of the ODE parameters.

We have successfully developed two regularisation methods to counteract this problem. The first method uses feedback from the ODEs to regularise the interpolant. This means that interpolants that merely fit the data well but are not in the solution space of the ODEs are suppressed. The second approach uses time warping to homogenise the functional length scale, i.e. the time interval within which meaningful variations of the interpolant happen.

We have reported our methodological findings in 13 research publications, and we have implemented the main methodological tools developed in two CRAN packages. We have developed two tutorial videos that explain both the concepts of the two methods mentioned above (ODE regularisation and time warping) as well as the corresponding software implementations:
https://www.youtube.com/watch?time_continue=2&v=Wn8nuFwtTJY
https://www.youtube.com/watch?time_continue=10&v=o5F-UGHH7P4
Exploitation Route The methods we have developed have the potential to make statistical inference feasible in mathematical models of complex systems, e.g. in ecology, health care, and the life sciences. To improve the uptake of our methods by these disciplines, we have developed a user-friendly graphical user interface (GUI) for the software that we have developed. The software is on CRAN, https://cran.r-project.org/package=shinyKGode and described in the following paper: https://doi.org/10.1093/bioinformatics/bty089 . To promote the use of both our methods and the software, we have developed a set of video tutorials:
https://www.youtube.com/watch?time_continue=2&v=Wn8nuFwtTJY
https://www.youtube.com/watch?time_continue=10&v=o5F-UGHH7P4
https://joewandy.github.io/shinyKGode/
Sectors Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

URL https://joewandy.github.io/shinyKGode/
 
Description The research project was about a methodological concept at the interface of machine learning, computational statistics and systems biology. The project has led to 14 peer-reviewed publications in the scientific literature, including top Machine Learning conferences (like AISTATS and ICML), Statistics journals (like ``Statistics and Computing" and ``Computational STATISTICS'), and the Biomedical Engineering literature (like ``Biomedical Engineering Online" and ``Frontiers in Bioengineering and Biotechnology"). In addition to several contributed conference presentations, the research outputs were disseminated in six invited keynote talks at national and international workshops. Given the methodological nature of the project, impact is indirect, by providing novel underpinning methodologies benefitting the applied sciences. To maximise dissemination and uptake of our methods, we have implemented our novel algorithms in a publicly available software package (KGode), which can be downloaded from CRAN. In addition, we have developed a graphical user interface (ShinyKGode), to make the application of the Software maximally user-friendly, and we have developed several YouTube videos, with tutorials on how to use the methods and the software. As a consequence, there has been a wide uptake of our new methodological tools, with over 20,000 downloads of our CRAN package.
First Year Of Impact 2019
Sector Other
 
Description Royal Society of Edinburgh sabbatical research grant
Amount £62,800 (GBP)
Funding ID 62335 
Organisation Royal Society of Edinburgh (RSE) 
Sector Charity/Non Profit
Country United Kingdom
Start 07/2019 
End 06/2020
 
Title Approximate Bayesian inference in semi-mechanistic models 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Approximate parameter inference in systems biology using gradient matching: a comparative evaluation 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Controversy in mechanistic modelling with Gaussian processes 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE) 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Fast Parameter Inference in Nonlinear Dynamical Systems using Iterative Gradient Matching 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Hierarchical Bayesian Regression (HBR) and Analytic Gradient Calculation (GCGP) 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Inference in nonlinear differential equations 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Inference in nonlinear differential equations 
Description R code which provides an implementation of the method for parameter estimation in ODE systems for biopathway models, as described in the paper "Inference in nonlinear differential equations" by M. Niu, M. Filippone, D. Husmeier, & S. Rogers, published in the proceedings of the 30th International Workshop on Statistical Modelling (IWSM), Linz, Austria, 06-10 Jul 2015, pp. 187-190: http://eprints.gla.ac.uk/108342/ 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Too early to say, as the software has just been published. 
URL http://dx.doi.org/10.5525/gla.researchdata.284
 
Title KGode: Kernel Based Gradient Matching for Parameter Inference in Ordinary Differential Equations 
Description This is a CRAN package implementing the kernel ridge regression and gradient matching algorithm proposed in Niu et al. (2016) and the time warping algorithm proposed in Niu et al. (2017) for parameter inference in ordinary differential equations (ODEs). Four schemes are provided for improving parameter estimation in ODEs by regularization via a feedback mechanism from the ODEs and time warping. Detailed documentation is available at . 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact The software has just been released, so it's too early to report on impact. 
URL https://cran.r-project.org/web/packages/KGode/index.html
 
Title Mechanistic Modelling with Gaussian Processes 
Description Software to compare different methods of parameter inference in mechanistic models approximated by Gaussian processes, as described in the paper: Benn Macdonald, Catherine Higham, Dirk Husmeier "Controversy in mechanistic modelling with Gaussian processes" Proceedings of The 32nd International Conference on Machine Learning Journal of Machine Learning Research - Workshop and Conference Proceedings Volume 37, pp. 1539-1547, 2015 http://jmlr.org/proceedings/papers/v37/macdonald15.html 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact As described in the following video lecture, given in Lille (France), July 2015: http://videolectures.net/icml2015_macdonald_husmeier_mechanistic_modelling/ 
URL http://dx.doi.org/10.5525/gla.researchdata.283
 
Title Parameter inference in systems biology 
Description R code by Benn Macdonald for parameter inference in differential equations with Gaussian processes and gradient matching, as described in the following paper: Benn Macdonald, Mu Niu, Simon Rogers, Maurizio Filippone and Dirk Husmeier "Approximate parameter inference in systems biology using gradient matching: a comparative evaluation" Biomedical Engineering Online http://eprints.gla.ac.uk/113366/ In print 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Too early to say, as the code has just been published. 
URL http://dx.doi.org/10.5525/gla.researchdata.288
 
Title ShinyKGode 
Description Many processes in science and engineering can be described by dynamical systems based on nonlinear ordinary differential equations (ODEs). Often ODE parameters are unknown and not directly measurable. Since nonlinear ODEs typically have no closed form solution, standard iterative inference procedures require a computationally expensive numerical integration of the ODEs every time the parameters are adapted, which in practice restricts statistical inference to very small systems. To overcome this computational bottleneck, approximate methods based on gradient matching have recently gained much attention. In the ShinyKGode package, we have developed an easy-to-use application in Shiny to perform parameter inference on ODEs using gradient matching. The application, called shinyKGode, is built upon the KGode package, which implements the kernel ridge regression and the gradient matching algorithm proposed in Niu et al. (2016) and the warping algorithm proposed in Niu et al. (2017) for parameter inference in differential equations. Advanced features such as ODE regularisation and warping can also be easily used for inference in the shinyKGode application. shinyKGode has built-in support for the following three models described in Niu et al. (2017): - Lotka-Volterra, which describes the dynamics of ecological systems with predator-prey interactions. - FitzHugh-Nagumo), which is a two-dimensional dynamical system used for modelling spike generation in axons. - The Biopathway model described in Vyshemirsky and Girolami (2008), which is a model for the interactions of five protein isoforms. On top of that, shinyKGode also accepts user-defined models in SBML v2 format. This allows user-defined models specified in the SBML format to be loaded into the application. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact This software has just been released; so it's too early to report on impact. 
URL https://cran.r-project.org/package=shinyKGode
 
Title Unbiased LInear System SolvEr (ULISSE) 
Description This distribution contains code for scalable stochastic gradient-based inference for Gaussian processes, as described in the paper: M. Filippone and R. Engler. Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE). Journal of Machine Learning Research - Workshop and Conference Proceedings, Volume 37, pp. 1015-1024, 2015 http://jmlr.org/proceedings/papers/v37/filippone15.html 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact As described in the following video lecture, presented in Lille (France), July 2015: http://videolectures.net/icml2015_filippone_unbiased_linear_system/ 
URL http://dx.doi.org/10.5525/gla.researchdata.279
 
Description Special session organised at the International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2016) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The intended purpose of the session was a wider dissemination of both the general research topic of the grant and the specific outcomes of our research project in particular to postgraduate students and the international research community. The conference where the event was held (CIBB) draws an interdisciplinary audience from Statistics, Informatics, Mathematics and the Life Sciences. For that reason, organising a special session provided an excellent opportunity to communicate the relevance of our research project, and the wider research area in which its falls, beyond the boundaries of our own research discipline.
Year(s) Of Engagement Activity 2016
URL http://www.cs.stir.ac.uk/events/cibb2016/specialsessions.html#sp5