GW4 Tier 2 HPC Centre for Advanced Architectures

Lead Research Organisation: University of Bristol
Department Name: Computer Science

Abstract

This proposal by a consortium of the GW4 Alliance of Bristol, Bath, Cardiff and Exeter, in partnership with Cray and the Met Office, is to provide a national 64-bit ARM-based HPC service. The system will be one of the world's first to be based on Broadcom's Vulcan server-class chip. Details of this device are still under NDA, but the Vulcan CPU is generating excitement because it trades off much greater provision of memory bandwidth for less emphasis on peak FLOP/s, the former being more important for most scientific codes. Providing access to such a machine as a national service should therefore enable the UK's HPC community to quantify the benefit of memory bandwidth focused CPUs, thus informing future system procurements from Tier 1 to Tier 3. If this greater focus on memory bandwidth does, as expected, result in greater performance and science throughput, then ARM64-based machines, such as the Cray XC Scout system that we are proposing in this bid, will be genuine contenders for Tier 1 and Tier 3 production systems from 2017. In addition to our goal of providing one of the world's first ARM64 production HPC systems, this proposal will also provide a service to enable algorithm development and the porting and optimisation of scientific codes in readiness for ARM64 machines. This algorithm and software effort is a crucial part of any architectural evaluation, as rigorous architecture-to-architecture comparisons are only possible when optimisation levels across the architectures are similar. There is already tremendous interest in evaluating ARM64 within the HPC community, with multiple ARM-based HPC projects underway around the world. Our proposed machine will be able to run most existing codes "out of the box", supporting the most common parallel programming languages, including OpenMP and MPI. Thus most users should be able to begin to evaluate the service with minimal effort, and so we expect demand to be strong.
The system will be run as a national facility, with open calls for computing time allocated via a lightweight resource allocation process. A top-level Consortium Management Board will determine the policy for resource allocation between the different application areas as well as fundamental computational science research into next generation parallel algorithms. Operating expenses will be covered by the consortium and its partners. Systems administrator and power costs will be split across the partners, while a group of expert research software engineers will help the community develop new algorithms, port codes and rigorously evaluate this important new architecture.

Planned Impact

The GW4 ARM64 proposal with Cray and the Met Office will have many positive impacts for the UK's HPC community, our wider society, and for the UK's economy. These are detailed in the PtI document, but the first 6 are described below:

* Increase the rate of adoption for UK technology in the HPC marketplace. Our system will be one of the world's first production quality ARM-based HPC platforms, demonstrating the first ARM64 processors that are able to compete head-to-head with the best in class HPC processors. Today x86 processors account for around 85% of Top500 machines, and the UK-based ARM Ltd is aiming to win a significant fraction of this market. Our evaluation results will provide evidence regarding ARM's suitability to other HPC users, enabling more rapid adoption of these technologies and therefore financially benefiting the UK economy.

* Informing future Tier 1, 2 and 3 procurements. Recently mainstream HPC CPU technologies have seen a reduction in competition, with associated declines in cost competitiveness and rates of innovation. If the ARM64 CPUs in our proposal prove successful, this should increase competition between HPC processor vendors, driving improvements in cost effectiveness and improving the rates of innovation once again. A potential outcome would be future national, regional and local machines being based on the ARM64 technology in our proposal, benefiting the UK in terms of more science achieved for a given investment.

* Increasing research ties with leading HPC centres around the world. The ARM64 technology is generating high levels of interest. As ours will be one of the first production ARM64 systems in the world, leading HPC centres have approached us asking to collaborate, establish new networks of expertise, and to share results. See the attached letters of support from Dr Jim Ang, Director of the US's Exascale programme at of Sandia National Laboratory, and from Prof Andrew Randewich, AWE's chief scientist.

* We will provide a unique platform to conduct like-for-like advanced architecture comparisons, using a common, high-quality software stack. No other proposal can provide ARM64, Nvidia GPUs, and Intel CPUs and KNL, all within the same software stack. Because we will use the same software stack as on ARCHER, we will also enable comparisons of ARM64 with the current national service. This unique capability will enable the community to evaluate the impact of new architectures on specific research domains to inform future technology procurements and service developments for UK-based researchers.

* Increasing horizontal integration, promoting closer working with other Tier 2 centres. We have already begun talks with several other potential centres with which we share interests. The EPCC, Oxford and Cambridge proposals all share our strong interest in many-core technologies, while the UCL materials bid focuses on many of the same codes as us. We will all benefit from collaborating regarding these overlapping technologies and codes.

* User Training and Workshops are an essential component of the GW4 ARM64 Centre of Excellence, and these will be made openly available to UK EPS researchers, not just those from other Tier 2 sites. A series of online user guides, best practice documents and training material will be developed in collaboration with Cray, the technology partner. Workshops and training sessions will be run to support the community in rapidly being able to exploit the new technology in our system. Given the potential impact and desire to evaluate ARM64 it is envisaged that these sessions will use online resources to broaden participation options.

Other benefits include: providing a continuous integration (CI) platform; enhancing HPC expertise; and widening participation.

Publications

10 25 50

publication icon
McIntoshSmith S (2019) A performance analysis of the first generation of HPCoptimized Arm processors in Concurrency and Computation: Practice and Experience

 
Description We have been able to show that an Arm-based supercomputer is a viable alternative to current technologies (e.g. x86 from Intel and AMD, or POWER from IBM).
Exploitation Route The Isambard project is the world's first production supercomputer. As our results appear to be positive, it is likely we will see many others start to buy Arm-based supercomputers, leading to the creation of a new market potentially worth billions of USD to Arm and the relevant chip vendors (Cavium, Qualcomm etc) and systems vendors (Cray, HPE, Fujitsu etc).
Sectors Aerospace, Defence and Marine,Chemicals,Creative Economy,Digital/Communication/Information Technologies (including Software),Electronics,Energy,Environment,Financial Services, and Management Consultancy,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://gw4.ac.uk/isambard/
 
Description The Met Office is considering adopting these technologies for future climate and weather simulations. We're also seeing several Formula 1 racing teams, and oil and gas company, and several of the large US national labs also considering exploiting our findings and adopting these technologies.
First Year Of Impact 2017
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Environment,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description ASiMoV prosperity partnership 
Organisation Centre Modelling and Simulation (CFMS)
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation Rolls Royce Group Plc
Country United Kingdom 
Sector Private 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Edinburgh
Department Edinburgh Parallel Computing Centre (EPCC)
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Warwick
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation Zenotech
Country United Kingdom 
Sector Private 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description Arm centre of Excellence 
Organisation ARM Holdings
Country United Kingdom 
Sector Private 
PI Contribution Working with Arm to best exploit their architectures in HPC. Includes exploring the SVE vector instruction set, porting applications to Isambard, helping develop tools, compilers, simulators etc.
Collaborator Contribution Contributing expertise, cash, access to tools and simulators, running workshops, hackathons, BoFs etc.
Impact Presentations at Arm-related workshops, panels, BoFs etc. Invitations to present the results of our work at US national labs, SIAM PP18 in Japan, Berlin and more.
Start Year 2016