Managing Uncertainty in Complex Models: a step change in understanding how models perform

Lead Research Organisation: University of Sheffield

Department Name: Probability and Statistics

Abstract

This project concerns uncertainties in the predictions made by models. A model is a description of a real process, using mathematical equations. Usually, a computer is used to compute or solve these equations to produce the model predictions. We think of these as the outputs of the model. The model also has inputs of various kinds, which are numbers to put into the equations. For example, a model to forecast the weather is based on very complex equations describing the movement of the air at various altitudes, the formation of clouds, and so on. The numbers to be put into the model include the current state of the atmosphere, the temperature of the air at different locations and altitudes, physical constants used in the equations, and so on. Any model is an imperfect representation of reality, and its predictions are imperfect. The predictions can be wrong because the equations are wrong, they have the wrong numbers in them, or the computer program is solving them inaccurately. In practice, all of these imperfections are present to some degree. As a result, we may expect the true real-world value corresponding to the model output to be close to the model prediction, but there is uncertainty about its precise value.The objective of this project is to develop tools to manage the uncertainty in model outputs. One of the hardest tasks is simply to quantify the uncertainty / just how close to the output do we expect the true value to be? To answer this we must first quantify the uncertainty in the model inputs and the model structure. Then we must determine how these uncertainties feed through into uncertainty about the model output. The latter task can be very difficult. The usual approach is a method known as Monte Carlo, in which we take randomly sampled values of the inputs and run the model for each set of sampled inputs. The collection of outputs we obtain from these runs provides a random sample that tells us how much uncertainty there is in the model output. However, this method typically requires very many model runs, and if the model takes some minutes, hours or even days to run, then this is impractical.Methods have been developed to allow much more efficient determination of uncertainty in model outputs. These methods are quite well understood theoretically, and have been used in a number of serious applications. Nevertheless, they are not yet ready for routine use. Further work is needed to identify robust and reliable ways to implement them in a wide range of modelling situations, to tackle the large numbers of inputs, outputs and data sources that arise in many models, and to construct suitable links between the model outputs and the real-world variables that they are designed to predict. We seek funding for this work, which will turn the theory into usable tools, thereby providing a basic technology for the developers and users of models.The same underlying approach, known as Gaussian process emulation, can be adapted to address many other problems associated with complex models. These include sensitivity analysis (how much of the output uncertainty can be attributed to each source of input uncertainty?), calibration (using observations of the real-world phenomenon to reduce uncertainty about the inputs, and hence to reduce uncertainty in outputs), and the dynamic version of calibration known as data assimilation. The Bayesian approach handles all these tasks in a unified framework that addresses all sources of uncertainty. This project will produce a toolkit for these techniques and others, specified in a standard format (Universal Modelling Language), so that others can use the methods. We will also produce some substantial case studies to exemplify the techniques.

Funded Value:

£2,173,427

Funded Period:

Jun 06 - Sep 10

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/D048893/1

Principal Investigator:

Anthony O'hagan

Research Subject:

Mathematical sciences (100%)

Research Topic:

Statistics & Appl. Probability (100%)

Organisations

People	ORCID iD
Anthony O'hagan (Principal Investigator)
Jeremy Oakley (Co-Investigator)
Dan Cornford (Co-Investigator)
Peter Challenor (Co-Investigator)	http://orcid.org/0000-0001-8661-2718
Henry Wynn (Co-Investigator)
Michael Goldstein (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Bastos L (2009) Diagnostics for Gaussian Process Emulators in Technometrics

Bates R (2014) Optimal design for smooth supersaturated models in Journal of Statistical Planning and Inference

Bates R (2013) Smooth supersaturated models in Journal of Statistical Computation and Simulation

Boukouvalas A (2014) An Efficient Screening Method for Computer Experiments in Technometrics

Boukouvalas A (2014) Optimal design for correlated processes with input-dependent noise in Computational Statistics & Data Analysis

Bower R (2010) Galaxy formation: a Bayesian uncertainty analysis in Bayesian Analysis

Conti S (2009) Gaussian process emulation of dynamic computer codes in Biometrika

Goldstein M (2009) Reified Bayesian modelling and inference for physical systems in Journal of Statistical Planning and Inference

Oakley J (2009) Decision-Theoretic Sensitivity Analysis for Complex Computer Models in Technometrics

Abstract

Organisations

People

ORCID iD

Publications