Faster Uncertainty Quantification of Hydrocodes

Lead Research Organisation: University of Liverpool

Department Name: Electrical Engineering and Electronics

Abstract

This project focused on using machine learning to emulate computationally expensive calculations. The emulator can then be used to answer pertinent questions that are impossible to answer otherwise. The aim is to mirror successes that have been achieved using similar approaches in, for example, chemical formula formulation (where a two thirds reduction in computation required has been reported) and drug discovery (where a 95% reduction in the computational requirement needed for a certain objective has been reported). This project has been co-defined with, and will be co-supervised by the Defence, Science and Technology Laboratory (Dstl).
The specific motivation relates to hydrocodes, high-fidelity (and highly optimised) simulations of fluid dynamics which involve computationally expensive calculations pertaining to the chemistry and physics involved. Individual simulations can take days, even with supercomputers. Were it possible to use historic simulations to learn to emulate the calculations involved, the emulator could then be used to perform offline sensitivity analyses with respect to, for example, the parameters of the chemistry and physics. Such sensitivity analyses are, at best, limited today, making it very challenging to identify opportunities to, for example, reduce the number of inputs to the hydrocode. Given the parameters are not known precisely, it is also desirable for any online use of the hydrocode to consider the uncertainty in those parameters in the calculation of any prediction. However, such Uncertainty Quantification (UQ) would demand multiple runs of the hydrocode. Given that it would be impossible to perform one simulation in an online setting, an emulator is a vital component of any online UQ.
While one could use, for example, a Gaussian Process (GP) to implement the emulator, it is not clear what kernel should be used, i.e. how any emulator should interpolate between the input-output pairs associated with the hydrocode. The statistical inference of the kernel is challenging, particularly in this setting, where there will be a need to interpolate between the information in the historic simulations and the prior knowledge (albeit incomplete and imprecise) of kernels implied by the knowledge of the physics and chemistry. Emerging numerical Bayesian inference algorithms (specifically Sequential Monte Carlo samplers) make it possible to capitalise on high performance computing without compromising the fidelity of that inference process.
The aim of this PhD is to take a specific hydrocode and to examine how these approaches can be used to expedite analysis. The aim is to develop a single integrated approach to analysing and speeding up UQ on complex systems that is underpinned by a synergistic understanding of computer science and statistics. The anticipation is that this integrated approach would be sufficiently generic and transferable that it could be readily applied to other, similar problems.

Planned Impact

This CDT's focus on using "Future Computing Systems" to move "Towards a Data-driven Future" resonates strongly with two themes of non-academic organisation. In both themes, albeit for slightly different reasons, commodity data science is insufficient and there is a hunger both for the future leaders that this CDT will produce and the high-performance solutions that the students will develop.

The first theme is associated with defence and security. In this context, operational performance is of paramount importance. Government organisations (e.g., Dstl, GCHQ and the NCA) will benefit from our graduates' ability to configure many-core hardware to maximise the ability to extract value from the available data. The CDT's projects and graduates will achieve societal impact by enabling these government organisations to better protect the world's population from threats posed by, for example, international terrorism and organised crime.

There is then a supply chain of industrial organisations that deliver to government organisations (both in the UK and overseas). These industrial organisations (e.g., Cubica, Denbridge Marine, FeatureSpace, Leonardo, MBDA, Ordnance Survey, QinetiQ, RiskAware, Sintela, THALES (Aveillant) and Vision4ce) operate in a globally competitive marketplace where operational performance is a key driver. The skilled graduates that this CDT will provide (and the projects that will comprise the students' PhDs) are critical to these organisations' ability to develop and deliver high-performance products and services. We therefore anticipate economic impact to result from this CDT.

The second theme is associated with high-value and high-volume manufacturing. In these contexts, profit margins are very sensitive to operational costs. For example, a change to the configuration of a production line for an aerosol manufactured by Unilever might "only" cut costs by 1p for each aerosol, but when multiplied by half a billion aerosols each year, the impact on profit can be significant. In this context, industry (e.g., Renishaw, Rolls Royce, Schlumberger, ShopDirect and Unilever) is therefore motivated to optimise operational costs by learning from historic data. This CDT's graduates (and their projects) will help these organisations to perform such data-driven optimisation and thereby enable the CDT to achieve further economic impact.

Other organisations (e.g., IBM) provide hardware, software and advice to those operating in these themes. The CDT's graduates will ensure these organisations can be globally competitive.

The specific organisations mentioned above are the CDT's current partners. These organisations have all agreed to co-fund studentships. That commitment indicates that, in the short term, they are likely to be the focus for the CDT's impact. However, other organisations are likely to benefit in the future. While two (Lockheed Martin and Arup) have articulated their support in letters that are attached to this proposal, we anticipate impact via a larger portfolio of organisations (e.g., via studentships but also via those organisations recruiting the CDT's graduates either immediately after the CDT or later in the students' careers). Those organisations are likely to include those inhabiting the two themes described above, but also others. For example, an entrepreneurial CDT student might identify a niche in another market sector where Distributed Algorithms can deliver substantial commercial or societal gains. Predicting where such niches might be is challenging, though it seems likely that sectors that are yet to fully embrace Data Science while also involving significant turn-over are those that will have the most to gain: we hypothesise that niches might be identified in health and actuarial science, for example.

As well as training the CDT students to be the leaders of tomorrow in Distributed Algorithms, we will also achieve impact by training the CDT's industrial supervisors.

Student:

Kieron McCallan

Period of Study:

Oct 21 - Sep 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2599528

Research Topic:

Unclassified

Organisations

People	ORCID iD
Leszek Gasieniec (Primary Supervisor)
Kieron McCallan (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S023445/1			01/04/2019	30/09/2027
2599528	Studentship	EP/S023445/1	01/10/2021	30/09/2025	Kieron McCallan