# Bayesian inference with application to air quality monitoring

Lead Research Organisation:
Imperial College London

Department Name: Dept of Mathematics

### Abstract

This project aims to develop new methodology for performing statistical inference in environmental modelling applications. These applications require the use of a large number of sensors that collect data frequently and are distributed over a large region in space. This motivates the use of a space time varying stochastic dynamical model, defined in continuous time via a (linear or non-linear) stochastic partial differential equation, to model quantities such as air quality, pollution level, and temperature. We are naturally interested in fitting this model to real data and, in addition, on improving on the statistical inference using a carefully chosen frequency for collecting observations, an optimal sensor placement, and an automatic calibration of sensor biases. From a statistical perspective, these problems can be formulated using a Bayesian framework that combines posterior inference with optimal design.

Performing Bayesian inference or optimal design for the chosen statistical model may be intractable, in which case the use of simulation based numerical methods will be necessary. We aim to consider computational methods that are principled but intensive, and given the additional challenges relating to the high dimensionality of the data and the model, must pay close attention to the statistical model at hand when designing algorithms to be used in practice. In particular, popular methods such as (Recursive) Maximum Likelihood, Markov Chain Monte Carlo, and Sequential Monte Carlo, will need to be carefully adapted and extended for this purpose.

Performing Bayesian inference or optimal design for the chosen statistical model may be intractable, in which case the use of simulation based numerical methods will be necessary. We aim to consider computational methods that are principled but intensive, and given the additional challenges relating to the high dimensionality of the data and the model, must pay close attention to the statistical model at hand when designing algorithms to be used in practice. In particular, popular methods such as (Recursive) Maximum Likelihood, Markov Chain Monte Carlo, and Sequential Monte Carlo, will need to be carefully adapted and extended for this purpose.

## People |
## ORCID iD |

Nikolaos Kantas (Primary Supervisor) | |

Louis Sharrock (Student) |

### Studentship Projects

Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|

EP/R512540/1 | 01/10/2017 | 31/03/2022 | |||

1925152 | Studentship | EP/R512540/1 | 01/10/2017 | 30/09/2021 | Louis Sharrock |

Description | In this work, we have developed new methodology for performing statistical inference in environmental modelling applications. In particular, we consider the use of stochastic, dynamical, space-time varying models in such applications, and study two important mathematical problems relating to their practical implementation. The first problem of interest is "parameter estimation". It is often the case that a suitable model for the environmental quantity of interest is only known up to a set of unknown "model parameters", which must be estimated from the available data. We propose and implement a novel parameter estimation algorithm for one such model, which is derived from the physically motivated stochastic "advection-diffusion equation". We then demonstrate the efficacy of this algorithm via numerical simulations. We also show that, under certain technical assumptions, it is possible to implement this algorithm at a significantly reduced computational cost. Importantly, our algorithm can be implemented "online": that is, the parameter estimates can be updated sequentially as soon as new observations become available, without revisiting the past. The second problem of interest is "optimal sensor placement". The environmental quantity of interest is typically measured using a large number of sensors, which collect data frequently and are distributed over a large region of space. In this context, it is natural to ask whether it is possible to improve the statistical inference in the chosen model by varying properties of the underlying sensor network, such as the sensor placement, perhaps subject to certain pre-determined constraints. We propose and implement an optimal sensor placement algorithm for the model discussed above, and again demonstrate its efficacy via extensive numerical simulations. Once more, our algorithm can be implemented "online". The final problem of interest is that of joint online parameter estimation and online optimal sensor placement. That is, is it possible to achieve both of these objectives simultaneously? This question, despite its clear practical relevance, has not previously been addressed in the statistical literature. We demonstrate that this is indeed the case. In particular, we propose a novel solution to this problem, based on the use of a stochastic gradient descent scheme in continuous time for both the parameter estimates and the sensor locations. Furthermore, we provide rigorous mathematical proof that, for an extensive class of models, and under certain technical assumptions, our algorithm converges to the "true values". We also implement this algorithm for several models, including the model discussed previously, and demonstrate its efficacy via numerical simulations. |

Exploitation Route | There are a number of ways in which the current outcomes of this funding may be taken forward. Thus far, all numerical results have been obtained using so-called "linear models". It is of both academic and non-academic interest to extend and adapt the statistical methodology developed in this work to "non-linear" models, which are more widely applicable in practice. Furthermore, all numerical results have been obtained using simulated data. It is thus also hoped that, going forward, the proposed methodology can be applied in real data applications (e.g. large scale air pollution monitoring). It is anticipated that both of these extensions will form part of the work undertaken during the remainder of this award. |

Sectors | Environment,Other |

Description | CliMathParis 2019: Big Data, Data Assimilation and Uncertainty Quantification (IHP, Paris). |

Amount | â‚¬ 850 (EUR) |

Organisation | Henri PoincarĂ© Institute |

Sector | Academic/University |

Country | France |

Start | 11/2019 |

End | 11/2019 |