Robust, Scalable Sequential Monte Carlo with Application To Urban Air Quality
Lead Research Organisation:
University of Warwick
Department Name: Statistics
Abstract
This project is driven by two substantial considerations.
Methods for conducting inference, i.e. estimating the parameters of an indirectly observed system, in large complex systems are urgently needed. Existing technology does not generally scale well to the very large data sets which arise in many modern data-rich contexts. Most of the recent developments in computational statistics which aim at improving the scalability of existing algorithms have focused on data which has very particular forms and in particular can be viewed as very large numbers of replicates of measurements which are independent of one another. Such methods are not suitable for data sets which have strong spatial and temporal structures as, for example, many data sets obtained in urban analytic settings do. This project aims to develop a suite of methodological tools for conducting inference in models of this sort in a computationally efficient way, by exploiting the structure of the models in order to provide simultaneously efficient computational tools and good estimation. Furthermore, leveraging recent developments in the field of robust statistics, these methods will be adapted to deal with settings in which the modelling is imperfect and the data generating process is not exactly characterized by the mathematical model. This robustness is essential to obtain good performance in real, complex scenarios.
Air quality monitoring is a tremendously important and tremendously challenging area. Diverse sensor networks exist on different scales and provide measurements with quite different characteristics to one another. Fusing this information as observations become available is a large scale statistical inference problem. Indeed, problems of this type motivate the methodological development of this project and will serve as an extensive test-bed for the developed methodology. An extended application of those methods to air quality monitoring in the Greater London area with the support of the Greater London Authority provides the second major aspect of this proposal.
Methods for conducting inference, i.e. estimating the parameters of an indirectly observed system, in large complex systems are urgently needed. Existing technology does not generally scale well to the very large data sets which arise in many modern data-rich contexts. Most of the recent developments in computational statistics which aim at improving the scalability of existing algorithms have focused on data which has very particular forms and in particular can be viewed as very large numbers of replicates of measurements which are independent of one another. Such methods are not suitable for data sets which have strong spatial and temporal structures as, for example, many data sets obtained in urban analytic settings do. This project aims to develop a suite of methodological tools for conducting inference in models of this sort in a computationally efficient way, by exploiting the structure of the models in order to provide simultaneously efficient computational tools and good estimation. Furthermore, leveraging recent developments in the field of robust statistics, these methods will be adapted to deal with settings in which the modelling is imperfect and the data generating process is not exactly characterized by the mathematical model. This robustness is essential to obtain good performance in real, complex scenarios.
Air quality monitoring is a tremendously important and tremendously challenging area. Diverse sensor networks exist on different scales and provide measurements with quite different characteristics to one another. Fusing this information as observations become available is a large scale statistical inference problem. Indeed, problems of this type motivate the methodological development of this project and will serve as an extensive test-bed for the developed methodology. An extended application of those methods to air quality monitoring in the Greater London area with the support of the Greater London Authority provides the second major aspect of this proposal.
Planned Impact
The impact of this proposal has the potential to be deep and far-reaching.The most immediate downstream beneficiaries might be expected to be the general public in major urban centres, particularly London, via the applied component of the project which aims to improve air quality monitoring and forecasting in the Greater London Area. In order to facilitate this impact, the project will initially provide a full software implementation of the methodology including models appropriate to this setting and then in close collaboration with the data scientists of the Greater London Authority attempt to develop and deploy this methodology in real monitoring scenarios. The software will be made freely available in order to facilitate broader uptake of the work both within the air quality context and much more broadly.
Those scientists and policy-makers who work in the field of urban science will also benefit, rather directly, from the work which we seek to develop The provision of software and a dissemination workshop focussed upon the work and the associated software is intended to maximise this impact. Professional statisticians more broadly also stand to benefit from the software and methodological research. In order to ensure this broad range of beneficiaries becomes aware of the work and its potential, publications in a range of venues is envisaged --- from fundamental statistical publications through to domain specific conferences.
A workshop will be used to disseminate findings and software to interested parties, broadly determined, with a strong focus on facilitating interactions between different types of stakeholders. In particular, we will seek to interact extensively with a broad cross section of the urban analytics community via this workshop -- as well as publication in appropriate domain-specific venues.
The proposed research has the potential to ultimately inform public policy and drive decisions which will affect us all. Engaging the public broadly with fundamental research of this type is profoundly important. We consequently aim at significant engagement with the public via outreach opportunities mediated via the experienced team at the Turing and, also, the opportunities which will be afforded by the status of Coventry as the UK City of Culture in 2021.
Those scientists and policy-makers who work in the field of urban science will also benefit, rather directly, from the work which we seek to develop The provision of software and a dissemination workshop focussed upon the work and the associated software is intended to maximise this impact. Professional statisticians more broadly also stand to benefit from the software and methodological research. In order to ensure this broad range of beneficiaries becomes aware of the work and its potential, publications in a range of venues is envisaged --- from fundamental statistical publications through to domain specific conferences.
A workshop will be used to disseminate findings and software to interested parties, broadly determined, with a strong focus on facilitating interactions between different types of stakeholders. In particular, we will seek to interact extensively with a broad cross section of the urban analytics community via this workshop -- as well as publication in appropriate domain-specific venues.
The proposed research has the potential to ultimately inform public policy and drive decisions which will affect us all. Engaging the public broadly with fundamental research of this type is profoundly important. We consequently aim at significant engagement with the public via outreach opportunities mediated via the experienced team at the Turing and, also, the opportunities which will be afforded by the status of Coventry as the UK City of Culture in 2021.
Publications

Angeli L
(2021)
Limit theorems for cloning algorithms
in Stochastic Processes and their Applications

Boustati A
(2020)
Generalised Bayesian Filtering via Sequential Monte Carlo

Boustati A.
(2020)
Generalised Bayesian filtering via sequential Monte Carlo
in Advances in Neural Information Processing Systems

Brown S
(2021)
Simple conditions for convergence of sequential Monte Carlo genealogies with applications
in Electronic Journal of Probability

Brown S
(2023)
Weak convergence of non-neutral genealogies to Kingman's coalescent
in Stochastic Processes and their Applications

Chan R
(2021)
Divide-and-Conquer Fusion
Description | A major underlying goal of this research was to develop and characterize methods for conducting statistical inference in large-scale settings in which it is necessary to distribute computation and in settings in which robustness to misspecification of models is essential. There was substantial success in this area: * We obtained a comprehensive theoretical characterization of "divide-and-conquer sequential Monte Carlo" a framework which allows for inference in this setting but which had eluded deep formal analysis prior to this grant; * we showed that this type of approach can be combined with "generalised Bayesian inference" in the context of online inference (and inspired follow-up work by others extending the applicability of this approach to general inference problems); * an approach to use divide-and-conquer methods of this type for online inference in very high-dimensional time series has been developed and tested; * it became clear that the approach itself provided a framework for extending a different approach to distributed inference in which disparate estimates obtained for common quantities using different data sets are combined (known as Monte Carlo Fusion) to distributed settings. These were the major methodological goals of this grant. In addition to this, during the course of the grant we identified a different approach which could be used for a class of problems and which is particularly amenable to implementation on modern hardware. This approach has been dubbed "particle gradient descent" and is still under active development and theoretical investigation. |
Exploitation Route | The main contribution of this grant has been the development and deep understanding of a class of computational methods amenable to large scale inference. Such problems arise across a very wide range of areas and the potential for their use is very substantial; it is likely that these uses will be mediated by academics in the domains of application but the possible downstream impact (by making feasible the solution of problems which might otherwise have been intractable, for example) is significant. |
Sectors | Digital/Communication/Information Technologies (including Software) Environment |
Description | Engineering a Reduction in Air Pollution |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
URL | https://nepc.raeng.org.uk/media/h0hpcdan/nepc-air-pollution-report.pdf |
Description | RAE Roundtable on Engineering Solutions to Reduce Air Pollution |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Policymakers/politicians |
Results and Impact | This round table focussed on the opportunities for intervention in the transport system, considering the engineering design and technology innovations available to minimise air pollution at source, monitor levels, and reduce exposure. We hope the discussion will span current interventions and those on the near horizon and identify emerging opportunities and gaps. Chaired by Chief Medical Officer, Professor Chris Witty FMedSci with the explicit aim of informing "the content and structure of the 2022 Chief Medical Officer's Annual Report". |
Year(s) Of Engagement Activity | 2022 |