ExCALIBUR HES: Exascale Data Testbed for Simulation, Data Analysis & Visualisation
Lead Research Organisation:
UNIVERSITY OF CAMBRIDGE
Department Name: Chemistry
Abstract
his proposal is phase 3, i.e., a continuation of 2 previous Cambridge ExCALIBUR H&S funded projects under the same name "Exascale Data Testbed for Simulation, Data Analysis & Visualisation" The proposal builds on these 2 previous ExCALIBUR H & S projects to design, build and make available to the ExCALIBUR application community state of the art solid state I/O prototype platforms that can be used to understand and characterise novel emerging solid state file system technologies and help develop UK ExCALIBUR application capable of high I/O performance needed to scale at exascale.
This phase 3 proposal will extend the service of the phase 1 and 2 test beds for another 2 years and add functionality and range of files systems provided. This is enabled by new software tools and additional storage hardware. This additional functionality is seen to be needed in light of extensive engagement with ExCALIBUR projects Excalidata and Excalistore PI'd by Bryan Lawrence and direct engagements with the Met office, UKAEA , DiRAC, IRIS and SKA SRC projects. The proposal is split into 2 separate sections: -
A) I/O profiling software tools at application and system level
B) Additional hardware for I/O testbeds
A) I/O profiling software tools
Two commercial I/O profiling tools (Altaire's Mistral & Breeze) will be procured with capital funding from this call and installed on the operational Cambridge CSD3 HPC system and also on dedicated ExCALIBUR IO testbeds resulting from the phase 1 & 2 Cambridge ExCALIBUR H$S awards. Staff time contributed at 2 FTE is funded by the Cambridge Open Exascale Lab. Cambridge have already trialed these tools via evaluation licences from Altair and found them very useful.
These products show us the two views of HPC I/O we need to understand, one view from the microscopic application level the other view from the macroscopic system wide level. When we put these together, we should have a much better picture of what is going on with HPC I/O, providing the tools we need to understand the behavior of the different ExCALIBUR prototype file systems and then how to optimise applications we choose to run there.
B) Additional hardware for data testbeds - During community engagement with Excalistore, Excalidata, DiRAC, IRIS and SKS SRC community we see a strong need to develop and test high performance object storage for HPC using both SSD and spinning disk platforms. We also need a small spinning disk testbed to evaluate the same candidate file systems we build on the current solid-state testbed developed in the phase
Key deliverables - Technical reports on programme of works items 1-4, these will be fashioned into white papers and published through industrial partners Dell/Intel via Exascale Lab and also written up as academic papers. We have received a lot of interest in the approach of using a large-scale apples for apples NVMe testbed with different file systems, the work is highly publishable. The work will also produce valuable test beds, analysis systems and skilled people for UK HPC user communities mentioned above to test and improve I/O of key UK candidate exascale codes.
This phase 3 proposal will extend the service of the phase 1 and 2 test beds for another 2 years and add functionality and range of files systems provided. This is enabled by new software tools and additional storage hardware. This additional functionality is seen to be needed in light of extensive engagement with ExCALIBUR projects Excalidata and Excalistore PI'd by Bryan Lawrence and direct engagements with the Met office, UKAEA , DiRAC, IRIS and SKA SRC projects. The proposal is split into 2 separate sections: -
A) I/O profiling software tools at application and system level
B) Additional hardware for I/O testbeds
A) I/O profiling software tools
Two commercial I/O profiling tools (Altaire's Mistral & Breeze) will be procured with capital funding from this call and installed on the operational Cambridge CSD3 HPC system and also on dedicated ExCALIBUR IO testbeds resulting from the phase 1 & 2 Cambridge ExCALIBUR H$S awards. Staff time contributed at 2 FTE is funded by the Cambridge Open Exascale Lab. Cambridge have already trialed these tools via evaluation licences from Altair and found them very useful.
These products show us the two views of HPC I/O we need to understand, one view from the microscopic application level the other view from the macroscopic system wide level. When we put these together, we should have a much better picture of what is going on with HPC I/O, providing the tools we need to understand the behavior of the different ExCALIBUR prototype file systems and then how to optimise applications we choose to run there.
B) Additional hardware for data testbeds - During community engagement with Excalistore, Excalidata, DiRAC, IRIS and SKS SRC community we see a strong need to develop and test high performance object storage for HPC using both SSD and spinning disk platforms. We also need a small spinning disk testbed to evaluate the same candidate file systems we build on the current solid-state testbed developed in the phase
Key deliverables - Technical reports on programme of works items 1-4, these will be fashioned into white papers and published through industrial partners Dell/Intel via Exascale Lab and also written up as academic papers. We have received a lot of interest in the approach of using a large-scale apples for apples NVMe testbed with different file systems, the work is highly publishable. The work will also produce valuable test beds, analysis systems and skilled people for UK HPC user communities mentioned above to test and improve I/O of key UK candidate exascale codes.
Organisations
Publications

Venkatesh, R. S.
(2023)
Enhancing Metadata Transfer Efficiency: Unlocking the Potential of DAOS in the ADIOS Context

Venkatesh, R. S.
(2024)
Optimizing Metadata Exchange: Leveraging DAOS for ADIOS Metadata I/O
Description | Building on the previous work we have investigated the effect of a new version, 2.2, of the DAOS high performance object store, both in running the io500 benchmark and, in collaboration with Georgia Institute of Technology, investigating the effect of this platform on the integration of codes that use the ADIOS middleware to optimise I/O patterns. We have also extended the testbed platforms to use other filesystems apart from Lustre and DAOS, namely BeeGFS, Spectrum Scale and WekaFS. These filesystems are set up on identical hardware. With the exception of WekaFS, small node runs of the io500 benchmark have been run on these filesystems with the objective of determining the potential of the platform to fully utilise the client network capability. We have also investigated the impact of enabling GPUDirect Storage on benchmarks against the Lustre Filesystem. We have demonstrated that the new, at the time, release of DAOS did not have any appreciable effect on the performance as measured by the io500 benchmarks. In addition, we used the POSIX interface of the DAOS software to measure how code, that is not optimised for DAOS, might perform. This was discouraging as this interface performed poorly with code failing to complete in the vast majority of cases. This limitation in the software is expected to be mitigated in the, now current, 2.4.1 release of the filesystem. Further DAOS work involved a collaboration with The Georgia Institute of Technology, with Greg Eisenhauer and Sarpangala Venkatesh. In this they used the Cambridge Platform to investigate the effect of DAOS metadata configuration on the performance of both bespoke benchmark code and real-world applications - WarpX and E3SM. This work resulted in acknowledgements on a workshop paper at Supercomputing 2023 and ISC 2024. A comparison of three parallel filesystems was made using the io500 benchmark. These were simple two node runs against filesystems built against four servers. The aim was to see how the clients managed to utilise available network when the servers were not so restricted. The results, as measured by the io500 'score', show that BeeGFS was best able to maximise I/O in this situation, with Lustre second and Spectrum scale third. In terms of bandwidth the best client-side utilisation was 90% of the maximum and the worst 72% and, by this measure the best performing filesystem was Spectrum Scale and the worst BeeGFS. The testbed was used to set up a NVMe backed Lustre filesystem (Lustre 2.15.1) that supported GDS, (GPU Direct Storage). Two I/O benchmarks, ior and elbencho, that support GDS were run, both on local NVMe and the Lustre filesystem. Comparisons between the GDS enabled benchmark and the non-GDS enabled benchmark showed that speedups in I/O from the use of GDS were limited to a particular set of application parameters, such as chunk-size or if MPI or the CUDA threads were used. This was particularly true where Lustre was the target filesystem. |
Exploitation Route | The DAOS work has been used to move forward the integration of DAOS with WarpX and E3SM, which will be of use to the communities that use these codes. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | https://sc23.supercomputing.org/proceedings/workshops/workshop_pages/ws_pdsw111.html |