Next generation approaches to connect models and quantitative data

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute

Abstract

Simple mathematical models have been remarkably successful in helping us understand key processes in biology. Traditionally, the utility of models has been to test biological hypotheses by encoding extremely simple descriptions of the biology in a mathematical framework. Mathematical analysis and computer simulation are then used to test whether qualitative predictions of the model match experimental observations.

However, biology has advanced to the stage where experimental researchers can generate stunning images of cells and tissues at a level of resolution previously only dreamt of. Being able to visualise, for example, the dynamics of individual mRNAs and proteins over time, means that we can now generate extremely sophisticated hypotheses for how large gene regulatory networks or cells and tissues function. As a result, the mathematical models we develop to test biological hypotheses are quickly growing in size and complexity. In particular, the so-called agent-based models have become a popular tool in the modern life sciences. These allow the modeller to, for example, follow the fates and interactions of individual cells and, at the same time, include the effects of gene regulation and signalling.

For these agent-based models to be truly useful, for them to direct experimental efforts or even, eventually, replace the need for some experiments, we need to calibrate them using quantitative data. This simply stated need, however, poses a formidable set of challenges for the modelling community: (i) the models have many parameters that must be estimated; (ii) the data is complex, of multiple different types and rarely, if ever, are all the relevant cells or proteins measured or tracked, for example; (iii) the data are obscured by noise that is both intrinsic to the measured processes and introduced during the experiments.

The proposed research will generate new mathematical and computational tools to overcome these challenges. It will enable scientists in the modern life and biomedical sciences to calibrate models, then select the most appropriate model(s), and hence distinguish between competing biological hypotheses. To make sure they are relevant for biology, these new tools will be developed whilst investigating key biological questions. To ensure that the tools are available for re-use and extension by other researchers in the field, all of our computational codes and resources will be made freely available.

Technical Summary

Using mathematical models to assist in the design and interpretation of biological experiments is becoming increasingly important in biomedical and life sciences research; yet fundamental questions remain unresolved about how best to integrate experimental data within mathematical modelling frameworks to provide useful predictions. Novel mathematical, statistical and computational tools are needed to provide a standardised pipeline that enables experimental data to be used effectively in the development of models, and in model parameterisation and selection.

One key challenge in using mathematical modelling to interpret biological experiments is the question of how to integrate multiplex, multi-scale quantitative data generated in experimental laboratories to improve our understanding of a specific biological question. A standard protocol, that includes the design of experiments targeted towards parameterising models, validating specific model hypotheses, and inference of underlying mechanisms, based on quantitative data, is lacking. A significant reason for this is that, for the kinds of models that are required to interrogate phenomena in the modern life sciences, the calibration of models using quantitative data poses a formidable set of challenges. The models generally contain many parameters, and it is hard to obtain relevant data covering all the aspects of interest or importance to describe the system dynamics. In addition, the data that is collected usually has multiple, generally poorly characterised, sources of noise and uncertainty. Conventional statistical approaches either reach their limits or fail for such complex and, increasingly, high-dimensional problems. Here we seek to address precisely this point and develop a complementary suite of approaches that will enable scientists in the modern life and biomedical sciences to estimate model parameters and perform model selection for complex, multi-scale, and agent-based models.

Planned Impact

Economy: The use and analysis of data carry both social and economic costs. First, ineffective use of data generated at the expense of public funding is a waste of resources. Secondly, whenever animals are involved in research - as is routinely the case in immunology, developmental and stem cell biology, and physiology - we have to ensure that the 3R principles (replacement, reduction, refinement) are adhered to. The methodologies developed as part of this project will provide a direct means to mitigate these issues, by ensuring that experiments are designed to collect the appropriate data to answer specific questions.

Society: In terms of healthcare, we increasingly rely on diverse sets of data and their integration in order to make or plan concrete interventions in the life of patients or, in public health, make regulations that affect large parts of the population. It is essential to the decision- and policy-making processes that we understand how to integrate and interpret these diverse data sets using mathematical and statistical models and techniques. In addition, in the medium-to-long term, for personalised medicine to become a reality requires us to understand how to efficiently and accurately integrate and interpret patient-specific, multiplex, quantitative data using theoretical approaches. The proposed research will bring the UK research community further towards a unified pipeline for interfacing mathematical models with quantitative data.

Knowledge: It is now almost the norm, particularly in high profile journals, for publications from modern life sciences research groups to include a model that integrates biological hypotheses and validates them using experimental data. Rarely, however, are these models properly calibrated using quantitative data. A key reason for this is that conventional statistics approaches often reach their limits, or fail, for the complex and high-dimensional problems posed in attempting to calibrate the (increasingly) large and complex models now in routine use. The scientific advances that will be made as part of the proposed project will provide the relevant tools and techniques to overcome these issues. To ensure maximum impact, all computational algorithms and code for the technologies generated during this project will be made freely available for re-use and extension by the research community.

People: The next generation of researchers working at the interface of theoretical and experimental life sciences will require new skills; to be able to calibrate and interrogate complex models using multiplex, quantitative data in order to generate new insights and predictions. To this end, this project will train two postdoctoral research associates in developing and applying computational statistics approaches to estimate model parameters and perform model selection for complex, multi-scale, and agent-based models in the life and biomedical sciences.

Publications

10 25 50
 
Description The motivation behind my research proposal was the explosion in quantitative data now being routinely collected in the life and biomedical sciences. To exploit these data, we need to develop new mathematical and statistical theory, methods and algorithms to simulate models of biological processes, accurately infer their parameter values from multiplex quantitative data, and test the validity of models encoding different biological hypotheses. The main contributions of the project thus far are: * the development of a new methodology that exploits hierarchies of models, of different levels of complexity (fidelity) to apply likelihood-free Bayesian approaches to complicated models, of the type now routinely used in the life and biomedical sciences; * the development of a software tool (PakMan), modular, efficient and portable tool for running parallel approximate Bayesian computation algorithms; * application of the new methodologies to elucidate the mechanistic impacts of electric fields on single-cell motility.
Exploitation Route All methods, algorithms, software and data can be used and extended by others.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.iamruthbaker.com
 
Description Banff International Research Station (BIRS) 5-day workshop
Amount $100,000 (CAD)
Organisation Banff Centre 
Sector Academic/University
Country Canada
Start 11/2018 
End 11/2018
 
Title Multifidelity approaches to approximate Bayesian computation 
Description Development of a method that can employ models at different levels of complexity in parameter inference using approximate Bayesian computation. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Too early for notable impact. 
URL https://epubs.siam.org/doi/10.1137/18M1229742
 
Title Software for efficient, scalable parameter inference 
Description Pakman: a modular, efficient and portable tool for approximate Bayesian inference Pakman is a parallel ABC manager that is designed to be modular at the systems-level, as opposed to the application-level. Furthermore, Pakman is also designed to be portable and efficient. Pakman is written in C++11 (The C++ Standards Committee, 2011), and relies on the Message Passing Interface (MPI) library, standard MPI-3.1 (Message Passing Interface Forum, 2015), for parallelisation. We chose C++11 because of its native support for MPI and POSIX system calls, high-level programming language features, and efficiency. Moreover, we chose MPI as the platform for parallelisation because it is a well-established standard for distributed computing that has been implemented on a wide variety of systems, ranging from multi-core machines to large computational clusters. In summary, Pakman was made for performing likelihood-free inference when model simula- tions are computationally expensive. The lack of an analytical likelihood requires the applica- tion of ABC methods, and the computational cost of individual simulations merits a parallel approach to decrease the time to solution. Moreover, in order to be as modular as possible, models are specified as black-box programs. The target audience consists of researchers who want to parameterise a computationally demanding stochastic model based on experimental data. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Too early for notable impact. 
URL https://doi.org/10.21105/joss.01716
 
Title Supporting Material (Data) --- Quantifying the impact of electric fields on single-cell motility 
Description Supporting Material (Data) --- Quantifying the impact of electric fields on single-cell motility 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Publication - Quantifying the impact of electric fields on single-cell motility 
URL https://zenodo.org/record/4749429#.YipeWi-l2n8
 
Description Collaboration with Ilan Davis 
Organisation University of Oxford
Department Department of Biochemistry
Country United Kingdom 
Sector Academic/University 
PI Contribution Collaboration aimed towards combining mechanistic mathematical models with statistical inference techniques and experimental data to establish the mechanisms by which mRNA localisation is ensured robust.
Collaborator Contribution Design of the project, supervision of research, writing up publication.
Impact Multidisciplinary collaboration - mathematics, biochemistry and imaging. J. U. Harrison, R. M. Parton, I. Davis and R. E. Baker (2019). Testing models of mRNA localization reveals robustness regulated by reducing transport between cells. Biophys. J. 117(11):2154-2165.
Start Year 2017
 
Description Collaboration with Professor Matthew Simpson, Queensland University of Technology 
Organisation Queensland University of Technology (QUT)
Country Australia 
Sector Academic/University 
PI Contribution Collaborative projects include * mechanistic models to explore the effects of cell-cell interactions in cell invasion * the use of coarse-grained models in parameter inference * model selection for reaction-diffusion problems in biology * experimental design for optimal parameter inference.
Collaborator Contribution My team has been involved in planning, carrying out and writing up in each of the research projects described.
Impact This collaboration is theoretical, but includes the use of experimental data. The list below indicates relevant publications, a large number of other publications have also results from this collaboration (nearly 50 in total). A complete list can be found at https://www.iamruthbaker.com/publications/. M. J. Simpson, R. E. Baker, S. T. Vittadello and O. M. Maclaren (2020). Parameter identifiability analysis for spatiotemporal models of cell invasion. To appear in J. Roy. Soc. Interface. O. M. Matsiaka, R. E. Baker. and M. J. Simpson (2019). Continuum descriptions of spatial spreading for heterogeneous cell populations: theory and experiment. J. Theor. Biol. 482:109997. O. M. Matsiaka, R. E. Baker. E. Shah and M. J. Simpson (2019). Mechanistic and experimental models of cell migration reveal the importance of intercellular interactions in cell invasion. Biomed. Phys. Eng. Express 5(4):045009. D. J. Warne, R. E. Baker and M. J. Simpson (2019). Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art. J. Roy. Soc. Interface 16. A. Parker, M. J. Simpson and R. E. Baker (2018). The impact of experimental design choices on parameter inference for models of growing cell colonies. Roy. Soc. Open Sci. 5:8. D. J. Warne, R. E. Baker and M. J. Simpson (2018). Multi-level rejection sampling for approximate Bayesian computation. Comput. Stat. Data Anal. 124:71-86. D. J. Warne, R. E. Baker and M. J. Simpson (2017). Optimal quantification of contact inhibition in cell populations. Biophys. J. 113(9):1920-1924.
Start Year 2010
 
Description Control of collective cell motility using electric fields 
Organisation Princeton University
Country United States 
Sector Academic/University 
PI Contribution Developed models for collective cell motility under the influence of electric fields.
Collaborator Contribution Shared experimental data and biological insights and expertise.
Impact N/A
Start Year 2018
 
Description Quantifying electrotaxis in single cells 
Organisation University of California, Davis
Country United States 
Sector Academic/University 
PI Contribution Developed a novel model of single cell electrotaxis and used synthetic Bayes methods to quantify contributions of different electrotactic effects to observed motion.
Collaborator Contribution Provided experimental expertise, experimental data and biological expertise.
Impact T. P. Prescott, K. Zhu, M. Zhao and R. E. Baker (2021). Quantifying the impact of electric fields on single-cell motility. bioRxiv
Start Year 2019
 
Title Pakman: a modular, efficient and portable tool for approximate Bayesian inference 
Description Pakman is a software tool for parallel approximate Bayesian computation (ABC) algorithms. Its modular framework is based on user executables, which means that problem-specific tasks, like model simulations, are performed by black box executables supplied to Pakman by the user. Pakman parallelises the execution of simulations using MPI, a portable standard for distributed computing, and was designed to be lightweight so that a minimal amount of overhead goes into parallelisation. The problems that will benefit the most from Pakman are those where model simulations take a relatively long time, on the order of seconds or more. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact N/A 
URL https://joss.theoj.org/papers/10.21105/joss.01716
 
Description Colloquium (Warwick) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Applied Mathematics Colloquium at University of Warwick.
Year(s) Of Engagement Activity 2019
 
Description Mathematical and Statistical Challenges in Bridging Model Development, Parameter Identification and Model Selection in the Biological Sciences (18w5144) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Organised the meeting "Mathematical and Statistical Challenges in Bridging Model Development, Parameter Identification and Model Selection in the Biological Sciences (18w5144)" at Banff International Research Station. Attended by 40 participants.
Year(s) Of Engagement Activity 2018
URL https://www.birs.ca/events/2018/5-day-workshops/18w5144
 
Description Minisymposium talk (BayesComp2020) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Minisymposium talk at BayesComp202.
Year(s) Of Engagement Activity 2020
 
Description Plenary talk (Vigo) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Plenary Talk, New Vistas for Computational Systems and Synthetic Biology workshop in Vigo, Spain.
Year(s) Of Engagement Activity 2019
 
Description Plenary talk ANZIAM 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Plenary talk at ANZIAM 2019 meeting in Nelson New Zealand
Year(s) Of Engagement Activity 2019
 
Description Plenary talk and seminar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Plenary talk, and seminar at Gothenburg University.
Year(s) Of Engagement Activity 2018
 
Description SMB Minisymposium talk 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Minisymposium Talk, Society for Mathematical Biology Annual Meeting, Montreal.
Year(s) Of Engagement Activity 2019
 
Description Seminar (Southampton) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Applied Mathematics Seminar, Department of Mathematics, University of Southampton.
Year(s) Of Engagement Activity 2019
 
Description Seminar talk at University of Manchester 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Applied Mathematics Seminar
Year(s) Of Engagement Activity 2018
 
Description Talk - ACEMS 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Talk at Queensland University of Technology
Year(s) Of Engagement Activity 2021
 
Description Talk - IBS 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Seminar at IBS KAIST
Year(s) Of Engagement Activity 2021
 
Description Talk - IST 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Seminar at IST Austria
Year(s) Of Engagement Activity 2021
 
Description Talk - World Statistics Congress 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Talk at World Statistics Congress
Year(s) Of Engagement Activity 2021