Next generation approaches to connect models and quantitative data
Lead Research Organisation:
Imperial College London
Department Name: Life Sciences
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
Using mathematical models to assist in the design and interpretation of biological experiments is becoming increasingly important in biomedical and life sciences research; yet fundamental questions remain unresolved about how best to integrate experimental data within mathematical modelling frameworks to provide useful predictions. Novel mathematical, statistical and computational tools are needed to provide a standardised pipeline that enables experimental data to be used effectively in the development of models, and in model parameterisation and selection.
One key challenge in using mathematical modelling to interpret biological experiments is the question of how to integrate multiplex, multi-scale quantitative data generated in experimental laboratories to improve our understanding of a specific biological question. A standard protocol, that includes the design of experiments targeted towards parameterising models, validating specific model hypotheses, and inference of underlying mechanisms, based on quantitative data, is lacking. A significant reason for this is that, for the kinds of models that are required to interrogate phenomena in the modern life sciences, the calibration of models using quantitative data poses a formidable set of challenges. The models generally contain many parameters, and it is hard to obtain relevant data covering all the aspects of interest or importance to describe the system dynamics. In addition, the data that is collected usually has multiple, generally poorly characterised, sources of noise and uncertainty. Conventional statistical approaches either reach their limits or fail for such complex and, increasingly, high-dimensional problems. Here we seek to address precisely this point and develop a complementary suite of approaches that will enable scientists in the modern life and biomedical sciences to estimate model parameters and perform model selection for complex, multi-scale, and agent-based models.
One key challenge in using mathematical modelling to interpret biological experiments is the question of how to integrate multiplex, multi-scale quantitative data generated in experimental laboratories to improve our understanding of a specific biological question. A standard protocol, that includes the design of experiments targeted towards parameterising models, validating specific model hypotheses, and inference of underlying mechanisms, based on quantitative data, is lacking. A significant reason for this is that, for the kinds of models that are required to interrogate phenomena in the modern life sciences, the calibration of models using quantitative data poses a formidable set of challenges. The models generally contain many parameters, and it is hard to obtain relevant data covering all the aspects of interest or importance to describe the system dynamics. In addition, the data that is collected usually has multiple, generally poorly characterised, sources of noise and uncertainty. Conventional statistical approaches either reach their limits or fail for such complex and, increasingly, high-dimensional problems. Here we seek to address precisely this point and develop a complementary suite of approaches that will enable scientists in the modern life and biomedical sciences to estimate model parameters and perform model selection for complex, multi-scale, and agent-based models.
Planned Impact
Economy: The use and analysis of data carry both social and economic costs. First, ineffective use of data generated at the expense of public funding is a waste of resources. Secondly, whenever animals are involved in research - as is routinely the case in immunology, developmental and stem cell biology, and in physiology - we have to ensure that the 3R principles (replacement, reduction, refinement) are adhered to. The methodologies developed as part of this project will provide a direct means to mitigate these issues, by ensuring that experiments are designed to collect the appropriate data to answer specific questions.
Society: In terms of healthcare, we increasingly rely on diverse sets of data and their integration in order to make or plan concrete interventions in the life of patients or, in public health, make regulations that affect large parts of the population. It is essential to the decision- and policy-making process that we understand how to integrate and interpret these diverse data sets using mathematical and statistical models and techniques. In addition, in the medium-to-long term, for personalised medicine to become a reality requires us to understand how to efficiently and accurately integrate and interpret patient-specific multiplex quantitative data using theoretical approaches. The proposed research will bring the UK research community further towards a unified pipeline for interfacing mathematical models with quantitative data.
Knowledge: It is now almost the norm, particularly in high profile journals, for publications from modern life sciences research groups to include a model that integrates biological hypotheses and validates them against experimental data. Rarely, however, are these models properly calibrated using quantitative data. A key reason for this is that conventional statistics approaches often reach their limits, or fail, for the complex and high-dimensional problems posed in attempting to calibrate the increasingly large and complex models now in routine use. The scientific advances that will be made as part of the proposed project will provide the relevant tools and techniques to overcome these issues, and all computational algorithms and code will be made freely available for re-use and extension by the research community.
People: The next generation of researchers working at the interface of theoretical and experimental life sciences will require new skills; to be able to calibrate and interrogate complex models using multiplex quantitative data in order to generate new insights and predictions. To this end, this project will train two postdoctoral research associates in developing and applying computational statistics approaches to estimate model parameters and perform model selection for complex, multi-scale and agent-based models in the life and biomedical sciences.
Society: In terms of healthcare, we increasingly rely on diverse sets of data and their integration in order to make or plan concrete interventions in the life of patients or, in public health, make regulations that affect large parts of the population. It is essential to the decision- and policy-making process that we understand how to integrate and interpret these diverse data sets using mathematical and statistical models and techniques. In addition, in the medium-to-long term, for personalised medicine to become a reality requires us to understand how to efficiently and accurately integrate and interpret patient-specific multiplex quantitative data using theoretical approaches. The proposed research will bring the UK research community further towards a unified pipeline for interfacing mathematical models with quantitative data.
Knowledge: It is now almost the norm, particularly in high profile journals, for publications from modern life sciences research groups to include a model that integrates biological hypotheses and validates them against experimental data. Rarely, however, are these models properly calibrated using quantitative data. A key reason for this is that conventional statistics approaches often reach their limits, or fail, for the complex and high-dimensional problems posed in attempting to calibrate the increasingly large and complex models now in routine use. The scientific advances that will be made as part of the proposed project will provide the relevant tools and techniques to overcome these issues, and all computational algorithms and code will be made freely available for re-use and extension by the research community.
People: The next generation of researchers working at the interface of theoretical and experimental life sciences will require new skills; to be able to calibrate and interrogate complex models using multiplex quantitative data in order to generate new insights and predictions. To this end, this project will train two postdoctoral research associates in developing and applying computational statistics approaches to estimate model parameters and perform model selection for complex, multi-scale and agent-based models in the life and biomedical sciences.
People |
ORCID iD |
Michael Stumpf (Principal Investigator) |
Publications
Anderson DF
(2020)
Time-dependent product-form Poisson distributions for reaction networks with higher order complexes.
in Journal of mathematical biology
Dony L
(2019)
Parametric and non-parametric gradient matching for network inference: a comparison.
in BMC bioinformatics
Ham L
(2020)
Exactly solvable models of stochastic gene expression.
in The Journal of chemical physics
Schnörr D
(2021)
Learning System Parameters from Turing Patterns
Schnörr D
(2023)
Learning system parameters from turing patterns.
in Machine learning
Description | We have developed faster stochastic simulation approaches that allow us to perform inference for complex stochastic systems much more efficiently than has previously been the case. |
Exploitation Route | We are releasing software that allows other people to use these methods. Our software package gpABC is now published and used by the community. |
Sectors | Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Healthcare |
Description | Our approaches are being used in the 2019 Google Summer of Code program. |
First Year Of Impact | 2019 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Description | This is a collaboration with Mathematicians at the University of Oxford |
Organisation | University of Oxford |
Department | Oxford Centre for Collaborative Applied Mathematics (OCCAM) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are only beginning this collaboration. |
Collaborator Contribution | We are in regular contact and initiating further grant applications. |
Impact | None so far. |
Start Year | 2016 |
Title | ABACUS |
Description | Julia Package for ABC inference |
Type Of Technology | Software |
Year Produced | 2018 |
Impact | This is a julia package which implements Approximate Bayesian Computation. |