Improving UK Air Quality Forecasts During Heatwaves through Advanced Statistical and Machine Learning Methods

Lead Research Organisation: Lancaster University
Department Name: Mathematics and Statistics

Abstract

The UK Met Office, amongst others, uses numerical process models to forecast ground-level air pollution. These forecasts are used to inform public-health warnings. There is concern that in certain situations, eg. during heat waves during which air pollution levels can rise to unusually high levels, these forecasts may not be as accurate as would be desirable. This project aims to develop statistical methodology to utilise information from both observational data and process-based models to improve forecasts and hence make more accurate health warnings.

Methodology utilised will be, primarily, taken from existing methodology used in extreme value analysis and the modelling of time-series. Many of these methods, particularly those for multivariate data (which is what we would be using), are non-trivial to implement and there is no general consensus on the 'best' model/approach. We would like to use these models to (i) compare/contrast the ability of the process-based forecasting models to predict extreme events and (ii) using the findings from part (i) develop statistical forecasting methods that improve on the process-based ones. For part (ii) this could be by taking a down-scaling type approach, ie. by adjusting for consistent biases or other errors in the process-based model output or by building up a completely new statistical forecasting model from scratch. The former would certainly require the use of

(a) multivariate extreme value analysis (to compare the forecasting model output with observations) - a paper which illustrates the broad concept if not the actual modelling details that we would use is here
https://onlinelibrary.wiley.com/doi/full/10.1002/env.2143
(b) regression and/or spatial techniques (for the same reasons as below).

For the latter we would be using
(a) univariate extreme value models, incorporating trends and non-seasonality (ie. regression modelling);
(b) forecasting methodology eg. dynamic linear models, potentially combined with extreme value analysis;
(c) spatial modelling of extremes to enable pooling of information, more realistic estimated uncertainty and confidence bounds, and ensure spatially consistent forecasts. Spatial modelling could either take a hierarchical modelling approach - see section 4 of
https://projecteuclid.org/download/pdfview_1/euclid.ss/1340110864
or be based on recent developments in the modelling of spatial extremes through a combination of geostatistical and multivariate extreme value models as detailed here:
https://www.lancaster.ac.uk/~wadswojl/CSE-paper.pdf;
(d) multivariate modelling _if_ we get to the point of trying to forecast extreme episodes that involve multiple pollutants;

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513076/1 01/10/2018 30/09/2023
2388219 Studentship EP/R513076/1 01/07/2020 30/09/2023 Kyle Jex