Combining Machine Learning and Data Assimilation to infer model errors

Lead Research Organisation: University of Reading

Department Name: Meteorology

Abstract

Data assimilation (DA) is the science of combining information contained in observations (or 'data') with prior knowledge of the system at hand, typically in the form of a coupled set of partial differential equations. It is Bayesian Inference applied to the geosciences, especially meteorology and climate science, where the laws of fluid-dynamics are known and bound to be used. This knowledge is then translated into a time-evolving numerical model and the size of the model is vastly larger than the amount of available data. As a consequence of this model-to-observation dimensional mismatch, the models play a critical role on the outcome of DA, that can thus be seen as a model-driven procedure.
Numerical models of geo-fluids used in DA are only an approximate representation of the real atmosphere, ocean or the whole climate system. The resulting model error has to be taken into account and a substantial amount of work has been devoted to make DA methods able to accommodate model error in a statistical way. The real model error is unknown, and arises from many different sources, such as numerical discretization, parametric error, and the presence of unresolved scales. Sub-grid processes in particular are very critical to the skill of the models and are described in appropriate sub-grid parametrization schemes. Estimating the form of these schemes and the values of their parameters is of crucial importance, both for prediction and for successful data assimilation. Existing estimation techniques rely heavily on physical intuition and ad-hoc use of available observations. However, systematic and robust methods to estimate model errors from model-observation mismatch data do not exist.
On the other hand, in recent times the constant increase of available observations, accompanied by a similarly spectacular growth of the computing power, have made fully data-driven approaches possible. This data-driven revolution has been mostly pushed by the flourishing of machine-learning (ML) techniques (e.g. deep neural networks, among others) that with increasing success have shown to be able to extract the underlying dynamical laws from a multivariate dataset, with impressive predictive skill and capabilities to classify complex behaviors.
At present, ML and DA algorithms are quite similar: both approaches optimize parameters given a set of targets (i.e., the observations). The optimization, or the training in ML jargon, requires computing gradients and adjoints in DA, referred to as backpropagation in ML. The major difference is that while in the DA the model is explicitly set out as a set of physical c constraints, in ML is only recently maturing to incorporate our physical knowledge. Furthermore, as opposed to DA, no principled uncertainty quantification is used in ML.
The complementarities of ML and DA, the success of DA in the geoscience, and the promising future of ML in the same area, motivates the search for suitable combinations of them that adequately exploit each of their strength and mitigate each of their weaknesses. The proposed PhD research program is will work at this boundary.
We propose to use machine learning to "learn" the parametrization of sub-grid processes that are not explicitly described in the core dynamical model, based on model-observation mismatch data from DA experiments. This new parametrization will then be implemented in the model and used to perform DA. Since the DA performance is nonlinearly related to the model error used, the resulting model-observation mismatch data can again we used by ML to improve its parameterization description, which can then be used in DA. This iterative process, if well defined, will converge, leading to a potential breakthrough in environmental prediction via both model improvement and superior DA/ML initialization of the models.

Student:

Daniel Ayers

Period of Study:

Sep 19 - Sep 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2267924

Research Topic:

Unclassified

Organisations

University of Reading (Lead Research Organisation)

People	ORCID iD
Peter Jan Van Leeuwen (Primary Supervisor)
Varun Ojha (Primary Supervisor)	http://orcid.org/0000-0002-9256-1192
Javier Amezcua (Primary Supervisor)
Daniel Ayers (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509723/1			01/10/2016	30/09/2021
2267924	Studentship	EP/N509723/1	23/09/2019	22/09/2022	Daniel Ayers

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects