Uncertainty quantification for the linking of spatio-temporal output of computer model hierarchies and the real world

Lead Research Organisation: University of Exeter
Department Name: Engineering Computer Science and Maths

Abstract

Quantification of uncertainty introduced by using computer models to study complex physical systems is a fundamental problem for modern science. Though a statistical methodology exists to perform the uncertainty quantification (UQ), the technology is not yet at the level required by high-end users with very slow and expensive computer models, such as the latest climate models. The research proposed will provide methodological developments in two key areas that will facilitate UQ for high-end models.

The first is a methodology for modelling spatio-temporal output of computer models dynamically. The methods developed will allow statistical models that represent the uncertainty in the spatio-temporal output of a computer simulator, for any choice of its input parameters, to be created that, when sampled from, will allow the modelled spatial field to evolve in time in a way that mimics the simulator and that reports the uncertainty in the representation. This will represent an important step forward for researchers in a variety of scientific disciplines, such as climate, where the evolution of spatial fields in time is of great interest. The proposed work will be developed from methods that the applicant has worked on for univariate time series and will combine techniques for the Bayesian analysis of multiple time series from the literature of state space modelling, with UQ methods that have explored basis expansions of spatial fields with Gaussian process emulation in order to make methodological advances.

The second involves using the first methodology in order to develop methods for using a hierarchy of related, but lower resolution models combined with whatever runs currently exist on the high-end, state of the art, simulator, to model spatio-temporal output of the high-end simulator dynamically. This will be done by adapting and extending current methods for linking two hierarchical simulators statistically.

The research will apply these methods to the Nucleus for European Modelling of the Ocean (NEMO) framework of ocean models. This framework contains a hierarchy of four ocean models, with the high-end model taking months to run at a single setting of the parameters on a super-computer due to the fine spatial resolution of the solver, and the fastest version running quickly on a desktop computer so that large ensembles can be generated. The high-end model, known as ORCA12, forms the ocean component of the UK's current climate model, HadGEM3-H. Working with collaborators at the National Oceanography Centre (NOC), the methodologies will be applied to existing and specially designed ensembles within the hierarchy in order to model key spatio-temporal fields of interest to the collaborators in ORCA12. This will aid them in understanding the response of the outputs of this important model and assist in its future development.

A further goal will be to facilitate uncertainty quantification for key spatio-temporal fields in the real ocean using the statistical model for ORCA12 and standard UQ methods.

Planned Impact

Uncertainty estimations for the predictions of ocean and climate model output are essential for planning and risk management, both on short to long time scales, including the development of appropriate climate change mitigation and/or adaptation strategies.

The NEMO modelling community, and operational agencies using NEMO such as the UK Met Office and the National Centre for Ocean Forecasting and a wide number of users across Europe will benefit from this research. It will give them new insights into the behaviour of ORCA12, the ocean component of the UK's climate model HadGEM3-H, at untried input settings and into the sensitivity of the model output to different settings of the inputs. This will assist them in model development and in making informed decisions as to where to spend their budget of super-computer time on runs of the current model. The planned analysis will provide a road map for future uncertainty analyses using the NEMO model.

The climate modelling community, including organisations that provide evidence to the IPCC that predicts the future impacts of climate change (e.g. the Met Office Hadley Centre), will benefit from this research. Spatio temporal models are ubiquitous in climate science and the highest resolution models can only be run on the most powerful computers in the world. The proposed research leading to a methodology designed to address these two issues and providing an illustration on a high-end ocean model, will enable climate modellers to address their own uncertainty questions using the proposed methodology as a tool and the proposed application as a road map. Climate modellers using NEMO, such as the UK Met Office, will benefit more directly in the same was as the NEMO community

There will be many downstream beneficiaries of the proposed methodologies and their application to ocean and climate models. These include any organisation or body that requires uncertainty on climate predictions in order to plan or manage risk. Examples include government agencies, research organisations, and environmental consultancies that compile information to provide advice to policy makers, industry and society on future climate change impacts, adaptation strategies and risk management; climate change adaption and risk management initiatives with links to the insurance industry; users of the IPCC reports. They will benefit due to the improvements in uncertainty quantification for climate and ocean models. Many of these will also benefit more directly from inferences about ocean quantities made as part of the proposed research.

There will be beneficiaries outside environment-related applications. Non environmental users of computer models are a very wide group including industrial communities such as oil and gas, aerospace and pharmaceuticals. Where these models are expensive and/or produce spatio-temporal output, their users will benefit directly from the new methodologies.

Publications

10 25 50
 
Description Developed new method for designing computer experiments for chaotic models such as climate models, that allows quantification of uncertainty due to the parameters and the parametric dependence of uncertainty on initial conditions and tested on a low resolution version of the UK ocean model NEMO. History matched NEMO for one wave and found that 95% of the parameter space currently explored can be ignored. This is part of a process for tuning the high resolution model using statistical methods and low resolution models that we are currently developing.

History matched NEMO over multiple waves (3) and demonstrated, for the ocean modelling community, that statistical tuning methods for ocean models can beat "by-hand" tuning. In particular, we found versions of NEMO that performed better in global mean depth integrated temperature and salinity with improved global mean square error.

Showed that tuning (calibration/history matching) for climate models with spatial output is not done well using Principal Component basis decomposition (also called EOF basis decomposition and the de facto method used by statisticians and those involved with statistical tuning of climate models (UKCP09 is a high profile example). Working with my PhD student, James Salter, we have developed a method for history matching using physically inspired bases that can be generated automatically and showed that this performs better than standard methods in a number of test cases.

Further developed basis methodology so that no elicitation of important physical patterns is needed, but that the Principal component decomposition can be optimally rotated towards the data. My PhD student and I have established this works and applied it many times to develop a tuning tool for the Canadian Climate Model. There is a paper in revision for the Journal of the American Statistical Association on this work and a number of further papers are being drafted. We have developed software for general climate model tuning in which this tool lives. If used this will have huge global impact as every single use of a climate model depends on the method used to tune it. It is now being used for the French climate model and the Canadian model is looking at an implementation.

Proved a theorem that shows that inference from a standard Bayesian analysis is not as close to true belief than inference made using multiple Bayesian Analyses that I called "Posterior Belief Assessment". The paper containing this proof has been published in the Lindley prize edition of Bayesian Analysis and subsequently won the Lindley prize in June 2016.

Developed methods with my PhD student Victoria Volodina for emulating non-stationary output of hierarchies of climate models. This methodology works well and is published in SIAM JUQ. The methodology is now part of software developed for tuning climate models. This software is now online, open source and is being used by the French model actively.
Exploitation Route The UKESM will feed directly into the next IPCC report and it's output will affect the everyday lives of everyone through its contribution to policy decisions for businesses, local authorities and governments. By improving the quality of the model through use of these statistical methods, the information on which these decisions are based will be improved. As the statistical methods are general for any computer model, not simply climate models, but any area of science, business or finance that uses computer models to assist decision making. Working with people at the Met Office and the National Oceanography Centre to disseminate these methods to assist in tuning the UK Earth System Model (UKESM). NEMO is the ocean component of this, so work specifically with NEMO will feed in directly, but the method can be applied to the coupled model.

I have already taken these methods to the Canadian climate modellers, spending over a month working with them to help develop the Canadian climate model. I've also met with developers of the French climate model in Paris and they've submitted a funding proposal to implement these ideas with their model. This proposal was funded and we have already run a workshop to help the modellers use our tools for tuning. I have developed a tool for emulating the spatial output the Canadian modellers use for model tuning that enables them to look at potential model performance of a proposed parameter choice in 3 seconds as opposed to the days it would take to run in full on their supercomputer. Further development of this tool for use in tuning climate models is planned in the future.

The developments for Bayesian statistics have attracted attention of other statisticians interested in how they might be applied in dating for archeological sites, and in food security. I hope to take them forward in partnership with scientists and statisticians interested in one or more of these areas so that they can be further developed and put to use.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Energy,Environment,Government, Democracy and Justice,Culture, Heritage, Museums and Collections

 
Description The software we have developed that contains all of the methods developed during the project is being actively used to tune the French climate models at Meteo France and LMDZ. Currently they are using it as part of the HIGH-TUNE project to develop new cloud parameterisations and have used it on the full GCM. The software is embedded in their model development processes (at least for convection and for the IPSL atmosphere). The formal tuning or development of the model may be done, in time, using the software.
First Year Of Impact 2019
Sector Environment
Impact Types Policy & public services

 
Description Past Earth Network
Amount £24,805 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2017 
End 10/2017
 
Description SIAM Travel Award for PhD student
Amount $800 (USD)
Funding ID DR17 E 
Organisation Society for Industrial and Applied Mathematics 
Sector Charity/Non Profit
Country United States
Start 07/2017 
End 07/2017
 
Title k-Extended Latin Hypercubes 
Description This is a method for generating the parameter choices of a computer model when running a large experiment. It was designed for climate models to be run with supercomputers when the amount of research scientist time involved in setting up and managing the ensemble means that it needs to be robust to failures and flexible enough to validate statistical models fitted using the ensemble. The tool was then coded up in R and the code made available to the public. It produces a balanced design of a given size that is constructed using many sub designs that are also balanced, with the full design and sub-designs all being "optimal" in terms of their coverage of parameter space and orthogonality. 
Type Of Material Improvements to research infrastructure 
Provided To Others? No  
Impact Ensembles of climate model runs have been designed using this method. The first was a 400 run ensemble of the NEMO ocean model, run using the UK supercomputer ARCHER. Analysis of this ensemble is part of output of the project and has led to 2 papers already. The second ensemble was a 65 member ensemble of the Canadian Climate model, run by the modellers at the Canadian Centre for Climate Modelling and Analysis, using their supercomputer. This ensemble has been part of an ongoing collaboration with CCCMA leading to at least 2 forthcoming papers. The 3rd is an ensemble of MIMA, a climate model developed from the GFDL code and being adapted in Exeter, run using the high performance computing at the University of Exeter. Analysis of this ensemble is ongoing. 
 
Title Canadian model output 
Description Data was generated by the Canadian Centre for climate modelling and analysis (CCCMA) using their supercomputer and climate model to run a design generated by myself using methods described in my Environmetrics paper (2015). There are now 3 parts to this ensemble, with 60 runs spanning the full range of each of 13 parameters to the model, a further 100 runs in a subspace selected using history matching to climatological variables selected by the modellers with uncertainties elicited by myself during the visit, and a further 50 runs designed by my PhD student based on these initial ensembles. The data is hosted at CCCMA in Victoria, though I am able to download it to use it for my own research and as a learning resource. For each ensemble member, we have time series and spatial field output to use for Uncertainty Quantification. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Following it's construction, the Canadian modellers have revisited their model and changed some of the schemes based on findings discovered during an initial analysis by me. For the next international multi-model experiment it is currently the intention to use some of my methods and code to assist in the tuning phase of the model. However, the current multi-model experiment is still on-going and won't be completed until 2017. I have developed a tool for emulating and visualising a predicted model output for tuning. 
 
Title NEMO ensemble at 2 degrees 
Description Based on an elicitation of the model parameters and subsequent analyses based on an initial ensemble I have designed the largest perturbed parameter ensemble of the NEMO ocean model (used by many global climate modelling centres, including the Met Office) in the world (that we know about). 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact It is early days for this data, but this could have a big impact on the recommended settings of the NEMO ocean in the UK Earth System Model and others. More analysis and work with it is required, but this will be a useful resource for NEMO developers as well as being useful for my own research. 
 
Description CCCMA 
Organisation Canadian Centre for Climate Modelling and Analysis (CCCma)
Country Canada 
Sector Public 
PI Contribution I spent 1 month (July 2015) in Victoria working with the modellers at the Canadian Centre for Climate Modelling and Analysis (CCCMA) on the invite of the head of model development John Scinocca. The goal was to take methods for tuning climate models to the actual community tuning the state of the art models for climate science and to develop a collaboration from the ground up. I spent a month designing ensembles and analysing them to illustrate the power of the methodology and to discuss their tuning practices with them. This resulted in a lot of data and discussion of alternative tuning practices for the next round of CMIP (after CMIP6). A subsequent week long visit by John to Exeter in November firmed up a number of ideas. Data transfer has begun, (from Victoria to Exeter) a paper has been outlined and I am working on tailoring some methods to their practices ready for another visit in the summer.
Collaborator Contribution John initialised the collaboration having heard me speak at a climate model tuning workshop in October 2014. He has sense provided unprecedented access to his model, team and supercomputing resource in order to explore the power of the methods I advocate and to begin fine tuning them for the Canadian model.
Impact 2 ensembles of the Canadian climate model (atmosphere only) spanning the range of 13 of the model's parameters and all of the associated output. Discovery that even lower values of a parameter than tested had an effect desired by the modellers. They have since revisited the parameterisation and changed the model on the basis of this discovery.
Start Year 2015
 
Description DECODE 
Organisation University of Exeter
Department Medical School
Country United Kingdom 
Sector Academic/University 
PI Contribution Bayesian modelling of the probability of dementia tailored to individual patients at the primary care diagnosis stage. We have developed and implemented Bayesian models that show the uncertainty in what were previously point estimates of dementia probability and have developed a classifier that accounts for this uncertainty that may be integrated into an app that the team led by David Llewelyn at the medical school wish to develop for primary care. We have developed shrinkage priors for model selection in dementia probability that identify the key factors that allow us to predict dementia. We have done this both with and without the usual memory tests that would be given in a GP exam, with the memory tests modelled as missing data that are predicted by the other factors for patients who have not done them. This allows for real time updating of dementia probability during a GP consultation, allowing the GP to administer only some or even no part of the memory tests, basing their decision on the probability of dementia and our certainty in the estimate. A method for eliciting informative priors for some of the key effects on the probability of dementia, such as history of stroke has also been developed. We are currently writing up these results for publication.
Collaborator Contribution Provided data on almost 900 patients that include a diagnosis for dementia and have provided support for a masters student studying this project in the form of meetings to discuss the data and some of the goals of the analysis and of the classifiers. Have guided our efforts on the problem and explained key features that would need handling by the Bayesian methods. Have taken part in elicitation exercises to develop informative priors for some of the effects in our model.
Impact Successful application to the Halpin Trust to work on the project. Pending application to the James Tudor foundation for funding to develop the primary care tool and embedded models. An Application for 1.5M USD to develop a spate-temporal model of dementia prevalence led by the medical school, with provision for a 4 year postdoc in stats if successful.
Start Year 2015
 
Description High Tune 
Organisation Pierre and Marie Curie University - Paris 6
Department Laboratory of Dynamic Meteorology
Country France 
Sector Academic/University 
PI Contribution Having been invited to give a talk to their group, I contributed to a proposal to tune the convection scheme of the French climate model. This proposal was to the French research funding body and contained money to fund a month of my salary to travel to France throughout the project to offer my expertise in tuning. Now me and my research team have provided a tool for climate model tuning containing many of the methods developed during this fellowship. The tool is embedded in their development for new convection schemes and my team maintain the statistical elements.
Collaborator Contribution They wrote the bid and I edited parts that were relevant to the statistical ideas for which my expertise had been sought. They have now embedded technology developed as part of this grant into a wider tool for tuning that is being used by both CNRM and IPSL, the 2 French climate models. They have provided my team with versions of the 1D LMDZ climate model to aid advances in methodology that can be tailored to 1D climate model output.
Impact The award was funded. If the project is successful, the French climate model will have an improved convection scheme, so this may have policy and societal benefits. Many presentations to the wider community have been made, and the tool, combining methods from this grant whilst embedding the models is available to a large group of researchers across centres. Papers are being written now.
Start Year 2015
 
Title ExeterUQ 
Description This is a set of codes for history matching for climate models and is currently used by developers of the French model and being tested by developers of the Canadian model. The codes produce robust emulators, history matching and imaging and contain the methods developed in papers that have spun out of my fellowship award. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The software is embedded in a tuning tool developed by Meteo France and LMDZ for tuning the French models. 
URL https://github.com/BayesExeter/ExeterUQ
 
Description Running workshops for climate modellers on climate model tuning 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The high-tune project ran its first workshop on model tuning, with the goal being for me and members of my team in UQ to come to Paris to work with French climate modellers working on 2 of the French models (LMDZ and CNRM). The idea was for us to take our software and examples and spend 2 days working with them to help them to apply our tools to the output of their models, including designing and running new simulations of their models to aid tuning. My PhD student and I combined a number of our codes, including many new methods developed during my fellowship, into beta-software that could be used by the modellers. We then ran the workshop and passed on the code, which they have begun to use. Another workshop is planned for May 3rd-5th 2018 with the same modellers, to help continue the engagement.
Year(s) Of Engagement Activity 2017