ExaClaw: Clawpack-enabled ExaHyPE for heterogeneous hardware

Lead Research Organisation: Durham University
Department Name: Computer Science

Abstract

Wave phenomena as they arise from conservation laws are omnipresent in computational
sciences, and codes simulating them typically ask for enormous compute power.
However, few mathematicians
and modellers have code at hand that allows them to evaluate their ideas straightforwardly on
peta- and exascale machines, many wave equation solvers do not a fit to heterogeneous
(GPU) hardware, and many wave simulations will require exascale capabilities from time
to time, yet not 24/7.
The community thus runs risk to fall into a sophistication gap,
where the scaling software does not incorporate the latest numerical and
algorithmic research, while the latest models and numerics are not scaled up.
It runs risk to fall into a heterogeneity gap, where the particular
hardware configuration that drives exascale is not appropriately supported by
the software.
It runs risk to fall into an economic disproportionality gap, where compute centres struggle
to make the case to grant a project full machine access as its code base cannot exploit the machine efficiently.


We propose to extend the FETHPC H2020 code ExaHyPE into a software called ExaClaw
which tackles these risks.
ExaClaw will couple the leading grid-based toolbox to model wave phenomena,
ClawPack, to the scaling, high-performance ADER-DG AMR engine ExaHyPE,
will be able to deploy compute-intense calculations to GPUs, and the team behind
ExaClaw will prototype a new supercomputer usage scheme well-suited to accommodate
bursts of extreme compute hunger.
These activities pair up with community building and the release of three ExaClaw demonstrators.
This makes ExaClaw a high-profile ExCALIBUR use case.


ExaHyPE is an engine to write solvers for grid-based, first-order
hyperbolic PDE equations.
It supports block-structured Finite Volume schemes and ADER-DG, and it realises a clear
separation of concern to support any application domain.
User feed application domain knowledge such as
flux functions, eigenvalues, initial values or refinement criteria into the engine.
The engine then runs and orchestrates the actual computation.
Mesh traversal, refinement, parallelisation, load balancing, limiting, and so forth
all are hidden from the user.
Internally, the code employs a small set of premanufactured Riemann solvers.
They can be replaced by custom user implementations.
To widen the engine's applicability and productivity, ExaClaw will supplement
ExaHyPE's Riemann solvers with solvers from the ClawPack suite.
ClawPack is the biggest open source repository for explicit wave equation system
solvers, and it comprises a huge variety of well-studied, bespoke Riemann solvers for various
application domains.


ExaHyPE realises a task decomposition where one particular task type dominates the runtime.
This type exhibits a high arithmetic intensity and
will be deployed to GPUs through various
technologies (OpenMP, OpenACC, OneAPI).
Instead of GPUs as workhorse slaves, ExaClaw's GPUs steal their jobs actively from the
compute nodes, i.e.~they are in charge of their own workload.
This establishes the notion of a skeleton hardware, where GPUs or other accelerators
can be dynamically added or removed to a supercomputer run, and code inherently fits
to different hardware configurations.


Finally, ExaClaw will investigate into a novel HPC usage scheme where the load balancing
minimises the number of used machine nodes.
If the workload of a run however becomes massive (due to adaptive mesh refinement, e.g.),
ExaClaw will be able to book further resources dynamically.
The project abandons a static hardware-to-run association and allows multiple
simulations to argue with each other which one should have the biggest share of resources.
Simulations thus can have (close to) full machine access when they need it, but release
resources whenever their demand decreases again.

Planned Impact

ExaClaw pushes the boundaries of
``simulationability'' by providing an improved wave equation solver engine with
which programmers can write faster simulation codes that can handle bigger
problems.
It does not compromise on state-of-the-art numerics or the
opportunity to harvest novel, heterogeneous, dynamic hardware.



These properties pave the way towards new science of economic or societal
relevance since they allow programmers to tackle challenges hitherto impossible
to handle.
We sketch this impact via an earthquake
simulation code which is used in Europe to assess seismic risks and hazards,
and we demonstrate ExaClaw's potential via flooding and storm surge simulations.
Both setups are, together with a third one in cosmology, ExaClaw demonstrators
and blueprints of eventual ExaClaw Phase 2 use cases.
They build upon the fact that the two baseline codes, ExaHyPE and ClawPack, are
flagship codes in the EU's Center of Excellence in the domain of Solid
Earth (ChEESE) or coastal flooding and storm search in the US, respectively.
As ExaClaw improves a generic wave solver engine, it does not pioneer these
application areas.
It works upon some exascale dealbreakers underlying those applications.


As second direction of travel, the ExaClaw working group evaluates,
challenges and stimulates hardware development and GPU programming paradigms, as
we compare and benchmark different approaches against each other.
Furthermore, ExaClaw proposes a paradigm shift how simulation codes book and use
compute resources.
We will prototype a novel, dynamic approach to use them, and, for this, rely on
close collaboration with Durham's HPC centres, experimental ExCALIBUR hardware
and hardware purchased in the Computer Science department, as well as our
partners from industry.
Publishing both our tools and insight, continuous performance tests
and the integration of various techniques ranging from containers through
frequency modifications to smart network devices gives industry the opportunity
to demonstrate the potential economic gain arising from new technology.
Eventually, this has the potential to challenge and change the way how vendors
design their machines and the way supercomputing centres allocate their resources.




Finally, ExaClaw delivers three demonstrators:
They are bespoke, challenging simulation setups that can be downloaded and
executed off the shelf.
Similar to ECP Proxy Applications or the Berkeley Dwarfs,
the demonstrators can be used as procurement and evaluation benchmarks, while
they also can serve as starting point for other researchers to prototype
and evaluate their ideas within a real-world simulation.
 
Description We had promised to deliver a wave equation system solver which can be used to simulate gravitational waves or seismic waves (earthquakes) for example. This wave equation solver should be able to use the latest mathematical formulations which can, for example, incorporate the flooding of coastlines by integrating the ClawPack Riemann solvers, it should run on GPGPUs, and it should be able to adopt dynamically and autonomously if the underlying computer changes (as parts of the machine fail, e.g.).

We have succeeded in meeting these three goals, and we have also identified a couple of bugs in the GCC and LLVM compiler suite w.r.t. OpenMP and SYCL GPU offloading. That is, our team has been able to find bugs in these industry-driven tools, reported them, and developed workarounds. While these workarounds helped us to achieve our goals, we are currently waiting for the bugs to be fixed in the first place. Once this is done, we can remove our workarounds again.
Exploitation Route All compiler bugs are reported to the LLVM or GCC community, respectively, and will be fixed in upcoming open source tool releases. All SYCL bugs are reported to Intel.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://www.peano-framework.org/
 
Description See description on project outcomes and how they affect the software landscape around compilers when it comes to GPU programming.
First Year Of Impact 2021
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Title Peano 
Description A framework for dynamically adaptive Cartesian grid. This is the fourth generation of the code which now comprises ExaHyPE and GPU support due to the ExaClaw project. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Used within ExaClaw and the H2020 project ChEESE. 
URL http://www.peano-framework.org/index.php/peano-v-3/