GW4 Tier 2 HPC Centre for Advanced Architectures
Lead Research Organisation:
University of Bristol
Department Name: Computer Science
Abstract
This proposal by a consortium of the GW4 Alliance of Bristol, Bath, Cardiff and Exeter, in partnership with Cray and the Met Office, is to provide a national 64-bit ARM-based HPC service. The system will be one of the world's first to be based on Broadcom's Vulcan server-class chip. Details of this device are still under NDA, but the Vulcan CPU is generating excitement because it trades off much greater provision of memory bandwidth for less emphasis on peak FLOP/s, the former being more important for most scientific codes. Providing access to such a machine as a national service should therefore enable the UK's HPC community to quantify the benefit of memory bandwidth focused CPUs, thus informing future system procurements from Tier 1 to Tier 3. If this greater focus on memory bandwidth does, as expected, result in greater performance and science throughput, then ARM64-based machines, such as the Cray XC Scout system that we are proposing in this bid, will be genuine contenders for Tier 1 and Tier 3 production systems from 2017. In addition to our goal of providing one of the world's first ARM64 production HPC systems, this proposal will also provide a service to enable algorithm development and the porting and optimisation of scientific codes in readiness for ARM64 machines. This algorithm and software effort is a crucial part of any architectural evaluation, as rigorous architecture-to-architecture comparisons are only possible when optimisation levels across the architectures are similar. There is already tremendous interest in evaluating ARM64 within the HPC community, with multiple ARM-based HPC projects underway around the world. Our proposed machine will be able to run most existing codes "out of the box", supporting the most common parallel programming languages, including OpenMP and MPI. Thus most users should be able to begin to evaluate the service with minimal effort, and so we expect demand to be strong.
The system will be run as a national facility, with open calls for computing time allocated via a lightweight resource allocation process. A top-level Consortium Management Board will determine the policy for resource allocation between the different application areas as well as fundamental computational science research into next generation parallel algorithms. Operating expenses will be covered by the consortium and its partners. Systems administrator and power costs will be split across the partners, while a group of expert research software engineers will help the community develop new algorithms, port codes and rigorously evaluate this important new architecture.
The system will be run as a national facility, with open calls for computing time allocated via a lightweight resource allocation process. A top-level Consortium Management Board will determine the policy for resource allocation between the different application areas as well as fundamental computational science research into next generation parallel algorithms. Operating expenses will be covered by the consortium and its partners. Systems administrator and power costs will be split across the partners, while a group of expert research software engineers will help the community develop new algorithms, port codes and rigorously evaluate this important new architecture.
Planned Impact
The GW4 ARM64 proposal with Cray and the Met Office will have many positive impacts for the UK's HPC community, our wider society, and for the UK's economy. These are detailed in the PtI document, but the first 6 are described below:
* Increase the rate of adoption for UK technology in the HPC marketplace. Our system will be one of the world's first production quality ARM-based HPC platforms, demonstrating the first ARM64 processors that are able to compete head-to-head with the best in class HPC processors. Today x86 processors account for around 85% of Top500 machines, and the UK-based ARM Ltd is aiming to win a significant fraction of this market. Our evaluation results will provide evidence regarding ARM's suitability to other HPC users, enabling more rapid adoption of these technologies and therefore financially benefiting the UK economy.
* Informing future Tier 1, 2 and 3 procurements. Recently mainstream HPC CPU technologies have seen a reduction in competition, with associated declines in cost competitiveness and rates of innovation. If the ARM64 CPUs in our proposal prove successful, this should increase competition between HPC processor vendors, driving improvements in cost effectiveness and improving the rates of innovation once again. A potential outcome would be future national, regional and local machines being based on the ARM64 technology in our proposal, benefiting the UK in terms of more science achieved for a given investment.
* Increasing research ties with leading HPC centres around the world. The ARM64 technology is generating high levels of interest. As ours will be one of the first production ARM64 systems in the world, leading HPC centres have approached us asking to collaborate, establish new networks of expertise, and to share results. See the attached letters of support from Dr Jim Ang, Director of the US's Exascale programme at of Sandia National Laboratory, and from Prof Andrew Randewich, AWE's chief scientist.
* We will provide a unique platform to conduct like-for-like advanced architecture comparisons, using a common, high-quality software stack. No other proposal can provide ARM64, Nvidia GPUs, and Intel CPUs and KNL, all within the same software stack. Because we will use the same software stack as on ARCHER, we will also enable comparisons of ARM64 with the current national service. This unique capability will enable the community to evaluate the impact of new architectures on specific research domains to inform future technology procurements and service developments for UK-based researchers.
* Increasing horizontal integration, promoting closer working with other Tier 2 centres. We have already begun talks with several other potential centres with which we share interests. The EPCC, Oxford and Cambridge proposals all share our strong interest in many-core technologies, while the UCL materials bid focuses on many of the same codes as us. We will all benefit from collaborating regarding these overlapping technologies and codes.
* User Training and Workshops are an essential component of the GW4 ARM64 Centre of Excellence, and these will be made openly available to UK EPS researchers, not just those from other Tier 2 sites. A series of online user guides, best practice documents and training material will be developed in collaboration with Cray, the technology partner. Workshops and training sessions will be run to support the community in rapidly being able to exploit the new technology in our system. Given the potential impact and desire to evaluate ARM64 it is envisaged that these sessions will use online resources to broaden participation options.
Other benefits include: providing a continuous integration (CI) platform; enhancing HPC expertise; and widening participation.
* Increase the rate of adoption for UK technology in the HPC marketplace. Our system will be one of the world's first production quality ARM-based HPC platforms, demonstrating the first ARM64 processors that are able to compete head-to-head with the best in class HPC processors. Today x86 processors account for around 85% of Top500 machines, and the UK-based ARM Ltd is aiming to win a significant fraction of this market. Our evaluation results will provide evidence regarding ARM's suitability to other HPC users, enabling more rapid adoption of these technologies and therefore financially benefiting the UK economy.
* Informing future Tier 1, 2 and 3 procurements. Recently mainstream HPC CPU technologies have seen a reduction in competition, with associated declines in cost competitiveness and rates of innovation. If the ARM64 CPUs in our proposal prove successful, this should increase competition between HPC processor vendors, driving improvements in cost effectiveness and improving the rates of innovation once again. A potential outcome would be future national, regional and local machines being based on the ARM64 technology in our proposal, benefiting the UK in terms of more science achieved for a given investment.
* Increasing research ties with leading HPC centres around the world. The ARM64 technology is generating high levels of interest. As ours will be one of the first production ARM64 systems in the world, leading HPC centres have approached us asking to collaborate, establish new networks of expertise, and to share results. See the attached letters of support from Dr Jim Ang, Director of the US's Exascale programme at of Sandia National Laboratory, and from Prof Andrew Randewich, AWE's chief scientist.
* We will provide a unique platform to conduct like-for-like advanced architecture comparisons, using a common, high-quality software stack. No other proposal can provide ARM64, Nvidia GPUs, and Intel CPUs and KNL, all within the same software stack. Because we will use the same software stack as on ARCHER, we will also enable comparisons of ARM64 with the current national service. This unique capability will enable the community to evaluate the impact of new architectures on specific research domains to inform future technology procurements and service developments for UK-based researchers.
* Increasing horizontal integration, promoting closer working with other Tier 2 centres. We have already begun talks with several other potential centres with which we share interests. The EPCC, Oxford and Cambridge proposals all share our strong interest in many-core technologies, while the UCL materials bid focuses on many of the same codes as us. We will all benefit from collaborating regarding these overlapping technologies and codes.
* User Training and Workshops are an essential component of the GW4 ARM64 Centre of Excellence, and these will be made openly available to UK EPS researchers, not just those from other Tier 2 sites. A series of online user guides, best practice documents and training material will be developed in collaboration with Cray, the technology partner. Workshops and training sessions will be run to support the community in rapidly being able to exploit the new technology in our system. Given the potential impact and desire to evaluate ARM64 it is envisaged that these sessions will use online resources to broaden participation options.
Other benefits include: providing a continuous integration (CI) platform; enhancing HPC expertise; and widening participation.
Organisations
- University of Bristol (Lead Research Organisation)
- UNIVERSITY OF OXFORD (Collaboration)
- UNIVERSITY OF EDINBURGH (Collaboration)
- Arm Limited (Collaboration)
- University of Warwick (Collaboration)
- Rolls Royce Group Plc (Collaboration)
- Centre Modelling and Simulation (CFMS) (Collaboration)
- Zenotech (Collaboration)
- UNIVERSITY OF CAMBRIDGE (Collaboration)
Publications
Zibouche N
(2021)
GW band structure of monolayer MoS 2 using the SternheimerGW method and effect of dielectric environment
in Physical Review B
Zibouche N
(2022)
Using in-plane anisotropy to engineer Janus monolayers of rhenium dichalcogenides
in Physical Review Materials
Tse J
(2021)
Unraveling the Impact of Graphene Addition to Thermoelectric SrTiO 3 and La-Doped SrTiO 3 Materials: A Density Functional Theory Study
in ACS Applied Materials & Interfaces
Taylor N
(2020)
Calcium-stannous oxide solid solutions for solar devices
in Applied Physics Letters
Soloviev M
(2022)
Modelling the adsorption of proteins to nanoparticles at the solid-liquid interface.
in Journal of colloid and interface science
Smolders TJAM
(2021)
3D-to-2D Transition of Anion Vacancy Mobility in CsPbBr3 under Hydrostatic Pressure.
in The journal of physical chemistry letters
Saunders William Robert
(2020)
Fast electrostatic solvers for kinetic Monte Carlo simulations
in JOURNAL OF COMPUTATIONAL PHYSICS
Saunders W
(2020)
Fast electrostatic solvers for kinetic Monte Carlo simulations
in Journal of Computational Physics
Roman-Trufero M
(2020)
Evolution of an Amniote-Specific Mechanism for Modulating Ubiquitin Signaling via Phosphoregulation of the E2 Enzyme UBE2D3.
in Molecular biology and evolution
Reguly I
(2019)
Performance Portability of Multi-Material Kernels
Ouro P
(2021)
On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processor-based HPC systems
in Computer Physics Communications
Ouro P
(2021)
Performance and wake characteristics of tidal turbines in an infinitely large array
in Journal of Fluid Mechanics
Morteo-Flores F
(2023)
First-Principles Microkinetic Study of the Catalytic Hydrodeoxygenation of Guaiacol on Transition Metal Surfaces
in ChemCatChem
McIntosh-Smith S
(2019)
A performance analysis of the first generation of HPC-optimized Arm processors
in Concurrency and Computation: Practice and Experience
McIntosh-Smith S
(2019)
Benchmarking the first generation of production quality Arm-based supercomputers
in Concurrency and Computation: Practice and Experience
McColl K
(2022)
Transition metal migration and O2 formation underpin voltage hysteresis in oxygen-redox disordered rocksalt cathodes.
in Nature communications
López J
(2020)
nsCouette - A high-performance code for direct numerical simulations of turbulent Taylor-Couette flow
in SoftwareX
Liu Y
(2017)
A Framework of Fog Computing: Architecture, Challenges, and Optimization
in IEEE Access
Laughton E
(2021)
A comparison of interpolation techniques for non-conformal high-order discontinuous Galerkin methods
in Computer Methods in Applied Mechanics and Engineering
Lanzetta L
(2021)
Degradation mechanism of hybrid tin-based perovskite solar cells and the critical role of tin (IV) iodide.
in Nature communications
Kowalec I
(2022)
A computational study of direct CO2 hydrogenation to methanol on Pd surfaces.
in Physical chemistry chemical physics : PCCP
Kabalan L
(2023)
Investigation of the Pd (1-x) Zn x alloy phase diagram using ab initio modelling approaches
in Journal of Physics: Condensed Matter
Kabalan L
(2021)
A computational study of the properties of low- and high-index Pd, Cu and Zn surfaces.
in Physical chemistry chemical physics : PCCP
Jesus R
(2023)
AArch64 Atomics
Jesus R
(2023)
Vectorizing and distributing number-theoretic transform to count Goldbach partitions on Arm-based supercomputers
in Concurrency and Computation: Practice and Experience
Giordano M
(2022)
Productivity meets Performance: Julia on A64FX
Deakin T
(2020)
Hostile Cache Implications for Small, Dense Linear Solves
Deakin T
(2019)
Performance Portability across Diverse Computer Architectures
Deakin T
(2020)
Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic S N Transport Sweeps on Many-Core Architectures
in Journal of Computational and Theoretical Transport
Culver S
(2020)
Evidence for a Solid-Electrolyte Inductive Effect in the Superionic Conductor Li 10 Ge 1- x Sn x P 2 S 12
in Journal of the American Chemical Society
Description | We have been able to show that an Arm-based supercomputer is a viable alternative to current technologies (e.g. x86 from Intel and AMD, or POWER from IBM). |
Exploitation Route | The Isambard project is the world's first production supercomputer. As our results appear to be positive, it is likely we will see many others start to buy Arm-based supercomputers, leading to the creation of a new market potentially worth billions of USD to Arm and the relevant chip vendors (Cavium, Qualcomm etc) and systems vendors (Cray, HPE, Fujitsu etc). |
Sectors | Aerospace, Defence and Marine,Chemicals,Creative Economy,Digital/Communication/Information Technologies (including Software),Electronics,Energy,Environment,Financial Services, and Management Consultancy,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
URL | http://gw4.ac.uk/isambard/ |
Description | The Met Office is considering adopting these technologies for future climate and weather simulations. We're also seeing several Formula 1 racing teams, and oil and gas company, and several of the large US national labs also considering exploiting our findings and adopting these technologies. |
First Year Of Impact | 2017 |
Sector | Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Environment,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Description | GW4 Tier-2 HPC Centre for Advanced Architectures |
Amount | £4,100,000 (GBP) |
Funding ID | EP/T022078/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 02/2020 |
End | 01/2023 |
Description | Isambard 2 expansion to add new testbeds and expand user base |
Amount | £301,395 (GBP) |
Funding ID | EP/W03218X/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2021 |
End | 03/2022 |
Description | ASiMoV prosperity partnership |
Organisation | Centre Modelling and Simulation (CFMS) |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | Contributing advanced computer architecture research to the ASiMoV prosperity partnership. |
Collaborator Contribution | Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem. |
Impact | Project only just begun, so no outputs yet. |
Start Year | 2018 |
Description | ASiMoV prosperity partnership |
Organisation | Rolls Royce Group Plc |
Country | United Kingdom |
Sector | Private |
PI Contribution | Contributing advanced computer architecture research to the ASiMoV prosperity partnership. |
Collaborator Contribution | Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem. |
Impact | Project only just begun, so no outputs yet. |
Start Year | 2018 |
Description | ASiMoV prosperity partnership |
Organisation | University of Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing advanced computer architecture research to the ASiMoV prosperity partnership. |
Collaborator Contribution | Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem. |
Impact | Project only just begun, so no outputs yet. |
Start Year | 2018 |
Description | ASiMoV prosperity partnership |
Organisation | University of Edinburgh |
Department | Edinburgh Parallel Computing Centre (EPCC) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing advanced computer architecture research to the ASiMoV prosperity partnership. |
Collaborator Contribution | Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem. |
Impact | Project only just begun, so no outputs yet. |
Start Year | 2018 |
Description | ASiMoV prosperity partnership |
Organisation | University of Oxford |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing advanced computer architecture research to the ASiMoV prosperity partnership. |
Collaborator Contribution | Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem. |
Impact | Project only just begun, so no outputs yet. |
Start Year | 2018 |
Description | ASiMoV prosperity partnership |
Organisation | University of Warwick |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing advanced computer architecture research to the ASiMoV prosperity partnership. |
Collaborator Contribution | Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem. |
Impact | Project only just begun, so no outputs yet. |
Start Year | 2018 |
Description | ASiMoV prosperity partnership |
Organisation | Zenotech |
Country | United Kingdom |
Sector | Private |
PI Contribution | Contributing advanced computer architecture research to the ASiMoV prosperity partnership. |
Collaborator Contribution | Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem. |
Impact | Project only just begun, so no outputs yet. |
Start Year | 2018 |
Description | Arm centre of Excellence |
Organisation | Arm Limited |
Country | United Kingdom |
Sector | Private |
PI Contribution | Working with Arm to best exploit their architectures in HPC. Includes exploring the SVE vector instruction set, porting applications to Isambard, helping develop tools, compilers, simulators etc. |
Collaborator Contribution | Contributing expertise, cash, access to tools and simulators, running workshops, hackathons, BoFs etc. |
Impact | Presentations at Arm-related workshops, panels, BoFs etc. Invitations to present the results of our work at US national labs, SIAM PP18 in Japan, Berlin and more. |
Start Year | 2016 |