Excalibur H&ES - ARM-GPU testbed & ARM Forge Licence

Lead Research Organisation: University of Leicester
Department Name: IT Services

Abstract

In 2018, the Exascale Computing ALgorithms & Infrastructures for the Benefit of UK Research (ExCALIBUR) programme was proposed by the Met Office, CCFE and EPSRC (on behalf of UKRI). The goal of ExCALIBUR is to redesign high priority computer codes and algorithms, keeping UK research and development at the forefront of high-performance simulation science. The challenge spans many disciplines and as such the programme of research will be delivered through a partnership between the Met Office and UKRI Research Councils. Research software engineers and scientists will work together to future proof the UK against the fast-moving changes in supercomputer designs. This combined scientific expertise will push the boundaries of science across a wide range of fields delivering transformational change at the cutting-edge of scientific supercomputing. DiRAC proposed the inclusion in the ExCALIBUR business case of a request for £4.5M in capital funding over 4.5 years to develop a hardware fore-sighting programme. Industry co-funding for the programme will be sought where possible.

The £4.5m capital is intended to provide a testbed area that uses pre-commercial equipment for software prototyping and development. It has two main purposes: (1) to enable the software community to be ready to use commercial products effectively as soon as they come on to the market; and (2) to provide the UKRI HPC community with the ability to influence industry and the necessary knowledge to guide their purchase decisions. This will ensure that facilities and the future UK National e-Infrastructure are in a position to maximise value for money by getting the most powerful systems exactly suited to the communities' needs. This double-pronged approach will give UK researchers a competitive advantage internationally.

ExCALIBUR will now establish a set of modest-sized, adaptable clusters dedicated solely to this purpose and embedded within established HPC environments. Although small, they need to be of a scale capable of carrying out meaningful performance studies. They are expected to be co-funded with industry partners and will initially require investments of £200k-£300k each, and will allow a range of future hardware to be assessed for its relevance to the delivery of UKRI science and innovation. The pre-commercial equipment will be refreshed and added to on a regular, likely to be annual, basis. This agile tactic is designed to take advantage of the different approaches across industry (some companies, e.g. NVidia tend to have a short (less than 3-month) pre-commercial window while for others this can be up to a year).

ExCALIBUR can use the hardware piloting systems to drive software innovation across the UKRI research community. Researchers are rightly reluctant to invest time in code development to take advantage of new hardware which may not be available at scale for several years or may even prove not to have longevity - scientific leadership demands that research funding is used to deliver science results now. In additoin and DiRAC and others will offer funded RSE effort to support the development work combined with access to novel technologies within modest-sized systems, Excalibur can lower the bar for engaging with the process of software re-engineering and encourage researchers to make the necessary (modest) investments of their time. In some cases, there may also be the potential for some immediate science outputs by exploiting the proof-of-concept systems.

Excalibur will thus be able to provide an incentive for greater software innovation across the UKRI research communities and help to ensure that when novel technology is included in national services, there are workflows that are already able to exploit it optimally. This will increase productivity across all UKRI computing services and enable UK researchers to use the latest hardware to deliver the largest and most complex calculations, ensuring international leadership.

Publications

10 25 50
 
Description The grant was used to procure Nvidia nodes containing Arm GPUs and Nvidia CPUs for the ExCALIBUR Hardware and Enabling software testbed programme. The nodes are being used by the ExCALIBUR and UKRI community for code testing purposes.

The Arm Forge Licence was used to support the ExCALIBUR and DiRAC communities in profiling software performance on a range of hardware. It provided essential information used in recent DiRAC HPC Facility hardware procurements.
Exploitation Route The software profiling information is being shared widely across UKRI.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare

 
Description Nvidia DevKit 
Organisation NVIDIA
Country Global 
Sector Private 
PI Contribution We are hosting an instance of the Nvidia DevKit and making it available to UKRI researchers for testing purposes in the context of the ExCALIBUR Hardware and Enabling Software programme.
Collaborator Contribution Nvidia have provided technical support for the installation and have hosted a hackathon at which UKRI participants were offered access to the new hardware at Leicester.
Impact Presentation at Nvidia GTC in March 2022 Panel discussion at Nvidia GTC in March 2022 Multi-disciplinary Nvidia hackathon supported in early March 2022 - UKRI users able to access.
Start Year 2021
 
Description NVIDIA GTC presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Pre-recorded presentation to the NVIDIA GPU Technology Conference (GTC) entitled Forty Powers of 10 - Accelerated: Simulating the Universe, from Quarks to Galaxy Clusters [S41703] to be broadcast on March 23 2022.

The DiRAC HPC Facility provides HPC resources for the U.K. theory community in particle physics, astrophysics, cosmology, and nuclear physics. During 2021, DiRAC deployed two NVIDIA A100-based services (at Edinburgh and Cambridge), and an ExCALIBUR-supported project at Leicester deployed the first NVIDIA Arm HPC Developer Kit in the Europe, Middle East, and Africa region. I'll present some early science results from the research teams that are exploiting these new systems. I'll describe the process of hardware and software co-design that has delivered significant performance improvements that are reducing time-to-science for many users. Finally, I'll present initial results from our evaluation of the NVIDIA Arm HPC Developer Kit.
Year(s) Of Engagement Activity 2022
URL https://www.nvidia.com/gtc/session-catalog/?search=wilkinson&search.sessiontype=option_1614028602338...
 
Description Nvidia Arm DevKit panel discussion 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Technical lead for project, Jon Wakelin, sat on panel with representatives from 3 other sides worldwide that are hosting DevKit deployments.
Year(s) Of Engagement Activity 2022
URL https://www.nvidia.com/gtc/session-catalog/?search=wakelin&search.sessiontype=option_1614028602338&t...