Exascale Data Testbed for Simulation, Data Analysis & Visualisation

Lead Research Organisation: University of Cambridge
Department Name: Chemistry

Abstract

In 2018, the Exascale Computing ALgorithms & Infrastructures for the Benefit of UK Research (ExCALIBUR) programme was proposed by the Met Office, CCFE and EPSRC (on behalf of UKRI). The goal of ExCALIBUR is to redesign high priority computer codes and algorithms, keeping UK research and development at the forefront of high-performance simulation science. The challenge spans many disciplines and as such the programme of research will be delivered through a partnership between the Met Office and UKRI Research Councils. Research software engineers and scientists will work together to future proof the UK against the fast-moving changes in supercomputer designs. This combined scientific expertise will push the boundaries of science across a wide range of fields delivering transformational change at the cutting-edge of scientific supercomputing. DiRAC proposed the inclusion in the ExCALIBUR business case of a request for £4.5M in capital funding over 4.5 years to develop a hardware fore-sighting programme. Industry co-funding for the programme will be sought where possible.
The £4.5m capital is intended to provide a testbed area that uses pre-commercial equipment for software prototyping and development. It has two main purposes: (1) to enable the software community to be ready to use commercial products effectively as soon as they come on to the market; and (2) to provide the UKRI HPC community with the ability to influence industry and the necessary knowledge to guide their purchase decisions. This will ensure that facilities and the future UK National e-Infrastructure are in a position to maximise value for money by getting the most powerful systems exactly suited to the communities' needs. This double-pronged approach will give UK researchers a competitive advantage internationally.
ExCALIBUR will now establish a set of modest-sized, adaptable clusters dedicated solely to this purpose and embedded within established HPC environments. Although small, they need to be of a scale capable of carrying out meaningful performance studies. They are expected to be co-funded with industry partners and will initially require investments of £200k-£300k each, and will allow a range of future hardware to be assessed for its relevance to the delivery of UKRI science and innovation. The pre-commercial equipment will be refreshed and added to on a regular, likely to be annual, basis. This agile tactic is designed to take advantage of the different approaches across industry (some companies, e.g. NVidia tend to have a short (less than 3-month) pre-commercial window while for others this can be up to a year).
ExCALIBUR can use the hardware piloting systems to drive software innovation across the UKRI research community. Researchers are rightly reluctant to invest time in code development to take advantage of new hardware which may not be available at scale for several years or may even prove not to have longevity - scientific leadership demands that research funding is used to deliver science results now. In addition and DiRAC and others will offer funded RSE effort to support the development work combined with access to novel technologies within modest-sized systems, Excalibur can lower the bar for engaging with the process of software re-engineering and encourage researchers to make the necessary (modest) investments of their time. In some cases, there may also be the potential for some immediate science outputs by exploiting the proof-of-concept systems.
Excalibur will thus be able to provide an incentive for greater software innovation across the UKRI research communities and help to ensure that when novel technology is included in national services, there are workflows that are already able to exploit it optimally. This will increase productivity across all UKRI computing services and enable UK researchers to use the latest hardware to deliver the largest and most complex calculations, ensuring international leadership.

Publications

10 25 50
 
Description This award had 3 main goals, a) to demonstrate the impact of the latest networking technology on the Cambridge Data Accelerator's (DAC) Lustre parallel filesystem's performance, b) to develop and performance test a small testbed for a new object store (DAOS) designed for massively distributed Non-Volatile Memory and c) to test the impact on real world codes. We have been able to demonstrate a 23% improvement in the IO500 benchmark with the upgraded system. The IO500 benchmarks is a global ranking and are designed to represent typical user workloads seen on real systems, the old Cambridge DAC had achieved the number 1 spot in the IO500 benchmark in 2019. We built and performance tested a small DAOS testbed one of the first deployments in the UK, the performance demonstrated is promising improving on our original DAC's performance but using only 20% of the resources. We have tested the application code, AREPO a massively parallel code for gravitational N-body systems and magnetohydrodynamics, both on Newtonian as well as cosmological backgrounds. The test setup consisted of 108 compute nodes, each node with 76 MPI processes (8,208 MPI processes in total). The total generated output is about 12 TB. The test was run on our standard Lustre Storage service and the upgraded DAC, which demonstrated a 68% improvement in achieved bandwidth.
Exploitation Route High performance or supercomputing is today, basically a I/O problem not a compute problem, and so these techniques and advanced high performance storage platforms will become more specialised and necessary to properly exploit the computational power available to the research community. We have been able to secure follow on funding to further explore some of our findings, particularly in relation to DAOS and its suitability for deployment in large scale Supercomputing.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description Exascale Data Testbed for Simulation, Data Analysis & Visualisation
Amount £200,000 (GBP)
Funding ID ST/V006282/1 
Organisation Science and Technologies Facilities Council (STFC) 
Sector Public
Country United Kingdom
Start 03/2021 
End 03/2022
 
Description SPF ExCALIBUR EX20- 6: I/O & Storage: ExcaliStore
Amount £741,553 (GBP)
Organisation Meteorological Office UK 
Sector Academic/University
Country United Kingdom
Start 06/2021 
End 05/2024
 
Title Exascale Data Testbed 
Description The Data Accelerator (DAC) consists of 24 Dell PowerEdge R740xd servers, each with 12 1.5TB NVMe disks. Using open-source software written in-house, they interface with Slurm's burst buffer plugin via etcd which is a key-value store for distributed systems. Upon a user requesting a buffer in their job script, a Lustre filesystem is created on request using enough NVMe disks to satisfy the users size requirement. Once the Slurm job ends, the filesystem is destroyed, and resources are released. The DAC nodes are connected to the ToR switches in the Cascade Lake racks. The 24 DAC servers are located in pairs across 12 Cascade Lake racks and are connected via low latency, high bandwidth networks. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2022 
Open Source License? Yes  
Impact We wanted to re-run the IO500 to check performance was as expected. To do this, we submitted a slurm job to execute the IO500 script on 10 compute nodes against a 336TiB buffer, and re-ran to tweak values for the tests to run long enough for a valid result. At this time, some DAC nodes were being used for other tests, so only 20 of the 24 were available. The 336TiB available gave us a Lustre filesystem with 240 OSTs and 20 MDTs. Compared to our 2019 IO500 submission (https://io500.org/submissions/view/78), here only have one NIC (in our 2019 submission we were using nodes with 2 NICs). In general we felt the performance from these tests looked as expected and gave us a very similar result to our previous run. The buffer we were creating had 20 MDTs, and by setting a value of 20 we saw consistently better mdtest results. Our results show a 23% performance improvement (compared to 2019 OPA result) on the IO500 benchmark on the DAC platform with Mellanox installed. (note that the version of Lustre used differed between the OPA (v2.13 development branch) run and the Mellanox (v2.12.5 production branch) run. 
URL https://excalibur.ac.uk/projects/exascale-data-testbed/