Advanced Computer Architectures for High Performance Computing

Lead Research Organisation: University of Bristol
Department Name: Computer Science

Abstract

The field of HPC strives to obtain as much performance as possible from available hardware, aiming to increase the feasibility of computationally-intensive intensive applications, such as scientific simulations. Due to the advent of multi-core processor chips over the past decade, HPC applications have naturally aimed to exploit as much parallelism as possible. This parallelism is traditionally divided in two levels: between compute nodes in a cluster, and within a node (between processor cores). However, to take full advantage of modern hardware, a third level is often needed that exploits Single Instruction Multiple Data (SIMD) capabilities. This is possible through the use of special processor instructions able to apply the same operation to several sets of operands within the same processor cycle.
SIMD processing is not new in HPC, with Cray introducing the Cray-1 in 1976, the first line of supercomputers that used vector instructions. This has spawned modern interpretations, including the recently-announced Scalable Vector Extension (SVE) instruction set from ARM. SVE has already seen significant praise in industry, especially in the context of the current push towards exascale computing.
There is great scope to research the potential benefits of using modern vector architectures in HPC, such as those based on the newest-generation ARM hardware. Since most vectorization is done in optimising compilers, and because the SVE platform is itself very new to the market, a significant effort will be required to bring the performance of these optimising compilers up to the level they can currently achieve on Intel CPUs and NVIDIA GPUs.
With ARM only recently entering the HPC market, another important issue is that of developing application code that shows good performance on a variety of hardware platforms. In HPC, this is known as performance portability, hinting at the ideal of being able to take a single code and optimally running it on a variety of contemporary HPC platforms. In this sense, SVE can capitalise on its relative immaturity by making use of hardware-software co-design to ensure that future developments to this architecture directly benefit HPC applications.
This thesis explores the performance capabilities of modern vector processors designed for HPC, with a focus on ARM hardware. It attempts to do so via a combination of investigations and contributions to compiler software, hardware benchmarks, and real scientific codes that are used on supercomputers today. These investigations will be possible in part due to the Isambard GW4 Tier-2 service, a supercomputer which brings together hardware from the biggest HPC vendors today: Intel, NVIDIA, and ARM. Finally, it will attempt to respect the performance portability paradigm by ensuring that all code optimisations developed take into account all major existing and upcoming HPC processor architectures.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R51245X/1 01/10/2017 30/09/2021
1955618 Studentship EP/R51245X/1 25/09/2017 24/09/2021 Andrei Poenaru