Bayesian Nonparametric Methods for Aggregated and Multivariate Outputs

Lead Research Organisation: Imperial College London
Department Name: Dept of Mathematics

Abstract

This project investigates 2 types of situations under data-scarce labelled data that are expensive to obtain, situations which are problems that occur in many environmental and social sciences problems. We aim to develop novel methods that tackle these situations using flexible proxy models that encode prior beliefs and interpretable uncertainty quantifications. This project falls within the EPSRC Mathematical Sciences research area and is partly funded by and in collaboration with Cervest Limited, an artificial intelligence start-up focusing on Earth Science AI, and Imperial College London. This collaboration between industry and academia will allow our research to have access to a wide array of Earth observation datasets from the industry as well as for the industry to gain access to novel methodologies for their own work.

The first part involving aggregated outputs address the situation where we typically observe or must average out quantities over large groups of individuals or geographical areas. An important application where this type of problem occurs is in computing the average treatment effect of administering a pharmaceutical or policy intervention. When labelled data is scarce, this type of problem is even more complex. For instance, how do we model crop yields over a large geographical region when we only know what the yield is for the entire region?

The second part of the project involves modelling multiple quantities, such as precipitation and temperature, jointly in a way that exploits their inter-dependence. Again, when labelled data is scarce modelling multiple quantities can allow for additional signals to be extracted.

To capture complex interactions between covariates and outputs, nonparametric methods, ones that assume infinitely many model parameters such as Gaussian processes (GP), provide a flexible way for encoding prior beliefs, and there is also a rich literature on using GPs for label-scarce and feature-rich situations (Law et al. (2018); Hamelijnck et al. (2019)). GPs encode prior beliefs using normal distributions and can also give uncertainty quantification, which is highly desirable for situations when this is important. Recently, tree-based models (Chipman et al. (2010); Lakshminarayanan et al. (2016)), where the prior belief is broken down into subgroups of individuals or subregions, have been of interest to the machine learning community, yielding highly competitive results to GPs. Like GPs, tree-based models also provide a flexible nonparametric model that can provide uncertainty quantification, but properties of tree-based priors have yet to have been fully exploited for more complex applications. We hope to work on the development of novel nonparametric methodologies as solutions for our project aims.

We will first develop novel nonparametric modelling approaches for applications that involve aggregated quantities of interest and outputs. We will then work on developing flexible models for multiple outputs with broad applications for environmental sciences in mind.

References:
Chipman, H.A., George, E.I. and McCulloch, R.E., 2010. BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), pp.266-298.

Hamelijnck, O., Damoulas, T., Wang, K. and Girolami, M., 2019. Multi-resolution multi-task Gaussian processes. In Advances in Neural Information Processing Systems (pp. 14025-14035).

Lakshminarayanan, B., Roy, D.M. and Teh, Y.W., 2016, May. Mondrian forests for large-scale regression when uncertainty matters. In Artificial Intelligence and Statistics (pp. 1478-1487).

Law, H.C., Sejdinovic, D., Cameron, E., Lucas, T., Flaxman, S., Battle, K. and Fukumizu, K., 2018. Variational learning on aggregate outputs with Gaussian processes. In Advances in Neural Information Processing Systems (pp. 6081-6091).

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2283505 Studentship EP/S023151/1 01/10/2019 30/09/2023 Harrison Zhu