Agent-based model calibration using likelihood-free inference
Lead Research Organisation:
University of Bristol
Department Name: Mathematics
Abstract
It is often possible to design simulators that are capable of modelling complex phenomena but are not well suited to standard statistical methods. This arises when running the simulator is straightforward, but the corresponding likelihood function is intractable, often due to a large number of latent variables which would need to be marginalised over to obtain the likelihood function. One key example are agent-based models, a flexible class of models where the behaviours of individual agents are specified, which then interact often producing complex emergent behaviour.
Recently, many simulation-based inference methods have been developed, which use simulations to approximate a function of interest, for example the likelihood, the posterior, or the likelihood-to-evidence ratio. Often the raw simulator output is of too high dimension to be used directly, so inference is instead performed using a set of summary statistics. In most models of interest, the simulator will be somewhat misspecified, meaning that the simulator does not perfectly replicate the true underlying data generating process (for any set of parameters). This leads to issues when approximating the function of interest, for example when approximating the likelihood, the observed data (or summary statistics) may fall far in the tails of the likelihood where density estimation may be unreliable, particularly for powerful methods like normalising flows (Papamakarios et al., 2021). The difficulty of handling misspecification in a principled manner has hindered application of simulators and simulation-based methods.
The overall aim of the project is to contribute to simulation-based methods and investigate ways to improve their robustness to model misspecification. Currently, this project will focus on two areas. The first focus of the project is to investigate the use of gradient boosting for obtaining conditional density estimates (e.g. the likelihood; Thomas et al., 2018). Gradient boosting is a method for constructing ensemble models from a set of simple base learning algorithms. A particular advantage of using a boosting framework is that variable selection and model fitting can be performed jointly. In the context of likelihood approximation, this corresponds to automatically selecting simulator parameters used to predict the conditional distribution parameters. The second area of investigation will be to develop methods for learning summary statistics that are informative and robust to misspecification. One option to achieve this could be to use variational autoencoders, which have been widely used to learn low dimensional representations of datasets in machine learning. To improve robustness to misspecification, ideas from generalised Bayesian posteriors (Schmon et al., 2020) or semi-modular inference (Carmona and Nicholls, 2020) could be applied to adapt the objective function.
Recently, many simulation-based inference methods have been developed, which use simulations to approximate a function of interest, for example the likelihood, the posterior, or the likelihood-to-evidence ratio. Often the raw simulator output is of too high dimension to be used directly, so inference is instead performed using a set of summary statistics. In most models of interest, the simulator will be somewhat misspecified, meaning that the simulator does not perfectly replicate the true underlying data generating process (for any set of parameters). This leads to issues when approximating the function of interest, for example when approximating the likelihood, the observed data (or summary statistics) may fall far in the tails of the likelihood where density estimation may be unreliable, particularly for powerful methods like normalising flows (Papamakarios et al., 2021). The difficulty of handling misspecification in a principled manner has hindered application of simulators and simulation-based methods.
The overall aim of the project is to contribute to simulation-based methods and investigate ways to improve their robustness to model misspecification. Currently, this project will focus on two areas. The first focus of the project is to investigate the use of gradient boosting for obtaining conditional density estimates (e.g. the likelihood; Thomas et al., 2018). Gradient boosting is a method for constructing ensemble models from a set of simple base learning algorithms. A particular advantage of using a boosting framework is that variable selection and model fitting can be performed jointly. In the context of likelihood approximation, this corresponds to automatically selecting simulator parameters used to predict the conditional distribution parameters. The second area of investigation will be to develop methods for learning summary statistics that are informative and robust to misspecification. One option to achieve this could be to use variational autoencoders, which have been widely used to learn low dimensional representations of datasets in machine learning. To improve robustness to misspecification, ideas from generalised Bayesian posteriors (Schmon et al., 2020) or semi-modular inference (Carmona and Nicholls, 2020) could be applied to adapt the objective function.
Planned Impact
The COMPASS Centre for Doctoral Training will have the following impact.
Doctoral Students Impact.
I1. Recruit and train over 55 students and provide them with a broad and comprehensive education in contemporary Computational Statistics & Data Science, leading to the award of a PhD. The training environment will be built around a set of multilevel cohorts: a variety of group sizes, within and across year cohort activities, within and across disciplinary boundaries with internal and external partners, where statistics and computation are the common focus, but remaining sensitive to disciplinary needs. Our novel doctoral training environment will powerfully impact on students, opening their eyes to not only a range of modern technical benefits and opportunities, but on the power of team-working with people from a range of backgrounds to solve the most important problems of the day. They will learn to apply their skills to achieve impact by collaborative working with internal and external partners, such as via our Rapid Response Teams, Policy Workshops & Statistical Clinics.
I2. As well as advanced training in computational statistics and data science, our students will be impacted by exposure to, and training in, important cognate topics such as ethics, responsible innovation, equality, diversity and inclusion, policy, effective communication and dissemination, enterprise, impact and consultancy skills. It is vital for our students to understand that their training will enable them to have a powerful impact on the wider world, so, e.g., AI algorithms they develop should not be discriminatory, and statistical methodologies should be reproducible, and statistical results accurately and comprehensibly communicated to the general public and policymakers.
I3. The students will gain experience via collaborations with academic partners within the University in cognate disciplines, and a wide range of external industrial & government partners. The students will be impacted by the structured training programmes of the UK Academy of Postgraduate Training in Statistics, the Bristol Doctoral College, the Jean Golding Institute, the Alan Turing Institute and the Heilbronn Institute for Mathematical Sciences, which will be integrated into our programme.
I4. Having received an excellent training, the students will then impact powerfully on the world in their future fruitful careers, spreading excellence.
Impact on our Partners & ourselves.
I5. Direct impacts will be achieved by students engaging with, and working on projects with, our academic partners, with discipline-specific problems arising in engineering, education, medicine, economics, earth sciences, life sciences and geographical sciences, and our external partners Adarga, the Atomic Weapons Establishment, CheckRisk, EDF, GCHQ, GSK, the Office for National Statistics, Sciex, Shell UK, Trainline and the UK Space Agency. The students will demonstrate a wide range of innovation with these partners, will attract engagement from new partners, and often provide attractive future employment matches for students and partners alike.
Wider Societal Impact
I6. COMPASS will greatly benefit the UK by providing over 55 highly trained PhD graduates in an area that is known to be suffering from extreme, well-known, shortages in the people pipeline nationally. COMPASS CDT graduates will be equipped for jobs in sectors of high economic value and national priority, including data science, analytics, pharmaceuticals, security, energy, communications, government, and indeed all research labs that deal with data. Through their training, they will enable these organisations to make well-informed and statistically principled decisions that will allow them to maximise their international competitiveness and contribution to societal well-being. COMPASS will also impact positively on the wider student community, both now and sustainably into the future.
Doctoral Students Impact.
I1. Recruit and train over 55 students and provide them with a broad and comprehensive education in contemporary Computational Statistics & Data Science, leading to the award of a PhD. The training environment will be built around a set of multilevel cohorts: a variety of group sizes, within and across year cohort activities, within and across disciplinary boundaries with internal and external partners, where statistics and computation are the common focus, but remaining sensitive to disciplinary needs. Our novel doctoral training environment will powerfully impact on students, opening their eyes to not only a range of modern technical benefits and opportunities, but on the power of team-working with people from a range of backgrounds to solve the most important problems of the day. They will learn to apply their skills to achieve impact by collaborative working with internal and external partners, such as via our Rapid Response Teams, Policy Workshops & Statistical Clinics.
I2. As well as advanced training in computational statistics and data science, our students will be impacted by exposure to, and training in, important cognate topics such as ethics, responsible innovation, equality, diversity and inclusion, policy, effective communication and dissemination, enterprise, impact and consultancy skills. It is vital for our students to understand that their training will enable them to have a powerful impact on the wider world, so, e.g., AI algorithms they develop should not be discriminatory, and statistical methodologies should be reproducible, and statistical results accurately and comprehensibly communicated to the general public and policymakers.
I3. The students will gain experience via collaborations with academic partners within the University in cognate disciplines, and a wide range of external industrial & government partners. The students will be impacted by the structured training programmes of the UK Academy of Postgraduate Training in Statistics, the Bristol Doctoral College, the Jean Golding Institute, the Alan Turing Institute and the Heilbronn Institute for Mathematical Sciences, which will be integrated into our programme.
I4. Having received an excellent training, the students will then impact powerfully on the world in their future fruitful careers, spreading excellence.
Impact on our Partners & ourselves.
I5. Direct impacts will be achieved by students engaging with, and working on projects with, our academic partners, with discipline-specific problems arising in engineering, education, medicine, economics, earth sciences, life sciences and geographical sciences, and our external partners Adarga, the Atomic Weapons Establishment, CheckRisk, EDF, GCHQ, GSK, the Office for National Statistics, Sciex, Shell UK, Trainline and the UK Space Agency. The students will demonstrate a wide range of innovation with these partners, will attract engagement from new partners, and often provide attractive future employment matches for students and partners alike.
Wider Societal Impact
I6. COMPASS will greatly benefit the UK by providing over 55 highly trained PhD graduates in an area that is known to be suffering from extreme, well-known, shortages in the people pipeline nationally. COMPASS CDT graduates will be equipped for jobs in sectors of high economic value and national priority, including data science, analytics, pharmaceuticals, security, energy, communications, government, and indeed all research labs that deal with data. Through their training, they will enable these organisations to make well-informed and statistically principled decisions that will allow them to maximise their international competitiveness and contribution to societal well-being. COMPASS will also impact positively on the wider student community, both now and sustainably into the future.
Organisations
People |
ORCID iD |
Matteo Fasiolo (Primary Supervisor) | |
Daniel Ward (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/S023569/1 | 31/03/2019 | 29/09/2027 | |||
2438224 | Studentship | EP/S023569/1 | 30/09/2020 | 20/01/2025 | Daniel Ward |