Cirrus Phase II: Preparing for Heterogeneity at the Exascale

Lead Research Organisation: University of Edinburgh
Department Name: Edinburgh Parallel Computing Centre

Abstract

EPSRC's Tier 2 High Performance Computing (HPC) services provide access to supercomputing systems that complement the national HPC service, ARCHER. In 2016 a national network of five Tier 2 HPC services was established across the UK. One of these services, Cirrus, is hosted and operated by EPCC, the supercomputing centre at the University of Edinburgh. The Cirrus service has been very successful, currently with over 380 active users of which over 50 are from industry. The Cirrus Phase II service will build on the success of the Cirrus service by adding tightly integrated GPU accelerators to the system. GPU accelerators help supercomputing applications run faster by accelerating their core numerical calculations. Cirrus Phase II will have 144 NVIDIA V100 GPUs and an accompanying fast storage layer for data intensive applications.

Systems that include both normal CPUs and GPU accelerators are often called heterogenous systems. The next frontier of supercomputing is the Exascale and the first Exascale systems will become operational in 2020 or 2021. There are two different technology solutions which can delvier an Exaflop (a billion billion calculations per second). One option is a CPU only approach the other approach is CPU and GPU heterogeneous approach. The enhanced Cirrus Phase II system will give scientists from across the UK the opportunity to explore how their modelling and simulation applications will run on a heterogeneous architecture and how difficult it is likely to be to deliver sufficient performance at scale.

In addition to exploring Exascale options, the system will also provide a high performance platform for AI training and research. Currently there is enormous interest in using AI in scientific and industrial applications and Cirrus Phase II, with its large number of NVIDIA GPUs and fast storage layer will provide an excellent platform for such applications.

Planned Impact

The Cirrus Phase II project will have very high impact. It will support computational science research from across the EPSRC scientific community. Enabling scientists to deliver impactful research results in their specific disciplines. Additionally, through EPCC's "Accelerator" programme, users from industry and commerce will use the system to develop and enhance their products and services. Cirrus Phase II builds on the initial Cirrus Tier 2 HPC service's industry programme and will further develop it through the additional hardware that will be installed.

Cirrus Phase II will build on a user base that includes projects from a wide range of research areas including: biochemistry, catalysis, chemistry, combustion, CFD, informatics, environmental modelling, seismology, ocean science, plasma physics, solid state physics, laser physics, optical physics and GIS.

In summary, Cirrus Phase II will continue to deliver the strong impact of the initial Cirrus project through:

1. Knowledge impacts: providing the underpinning e-Infrastructure for the broad EPSRC scientific community and to explore heterogeneous computing on the road to the Exascale.
2. Economic impacts: EPCC's industrial HPC programme is the leading programme in Europe and recognised world-wide as an example of best practice engagement with industry.
3. Societal impacts: developing and improving the multitude of items we use in our day to day lives and contributing to our understanding of global challenges such as climate change and sustainability.
4. People and skills impacts: supporting the Tier 2 network with SAFE, developing Research Software Engineer skills and the next generation of system managers.

Publications

10 25 50