Managing Uncertainty in Complex Models: a step change in understanding how models perform

Lead Research Organisation: University of Sheffield
Department Name: Probability and Statistics

Abstract

This project concerns uncertainties in the predictions made by models. A model is a description of a real process, using mathematical equations. Usually, a computer is used to compute or solve these equations to produce the model predictions. We think of these as the outputs of the model. The model also has inputs of various kinds, which are numbers to put into the equations. For example, a model to forecast the weather is based on very complex equations describing the movement of the air at various altitudes, the formation of clouds, and so on. The numbers to be put into the model include the current state of the atmosphere, the temperature of the air at different locations and altitudes, physical constants used in the equations, and so on. Any model is an imperfect representation of reality, and its predictions are imperfect. The predictions can be wrong because the equations are wrong, they have the wrong numbers in them, or the computer program is solving them inaccurately. In practice, all of these imperfections are present to some degree. As a result, we may expect the true real-world value corresponding to the model output to be close to the model prediction, but there is uncertainty about its precise value.The objective of this project is to develop tools to manage the uncertainty in model outputs. One of the hardest tasks is simply to quantify the uncertainty / just how close to the output do we expect the true value to be? To answer this we must first quantify the uncertainty in the model inputs and the model structure. Then we must determine how these uncertainties feed through into uncertainty about the model output. The latter task can be very difficult. The usual approach is a method known as Monte Carlo, in which we take randomly sampled values of the inputs and run the model for each set of sampled inputs. The collection of outputs we obtain from these runs provides a random sample that tells us how much uncertainty there is in the model output. However, this method typically requires very many model runs, and if the model takes some minutes, hours or even days to run, then this is impractical.Methods have been developed to allow much more efficient determination of uncertainty in model outputs. These methods are quite well understood theoretically, and have been used in a number of serious applications. Nevertheless, they are not yet ready for routine use. Further work is needed to identify robust and reliable ways to implement them in a wide range of modelling situations, to tackle the large numbers of inputs, outputs and data sources that arise in many models, and to construct suitable links between the model outputs and the real-world variables that they are designed to predict. We seek funding for this work, which will turn the theory into usable tools, thereby providing a basic technology for the developers and users of models.The same underlying approach, known as Gaussian process emulation, can be adapted to address many other problems associated with complex models. These include sensitivity analysis (how much of the output uncertainty can be attributed to each source of input uncertainty?), calibration (using observations of the real-world phenomenon to reduce uncertainty about the inputs, and hence to reduce uncertainty in outputs), and the dynamic version of calibration known as data assimilation. The Bayesian approach handles all these tasks in a unified framework that addresses all sources of uncertainty. This project will produce a toolkit for these techniques and others, specified in a standard format (Universal Modelling Language), so that others can use the methods. We will also produce some substantial case studies to exemplify the techniques.

Publications

10 25 50
publication icon
Bastos L (2009) Diagnostics for Gaussian Process Emulators in Technometrics

publication icon
Bates R (2014) Optimal design for smooth supersaturated models in Journal of Statistical Planning and Inference

publication icon
Bates R (2013) Smooth supersaturated models in Journal of Statistical Computation and Simulation

publication icon
Boukouvalas A (2014) An Efficient Screening Method for Computer Experiments in Technometrics

publication icon
Boukouvalas A (2014) Optimal design for correlated processes with input-dependent noise in Computational Statistics & Data Analysis

publication icon
Bower R (2010) Galaxy formation: a Bayesian uncertainty analysis in Bayesian Analysis

publication icon
Goldstein M (2009) Reified Bayesian modelling and inference for physical systems in Journal of Statistical Planning and Inference