ExCALIBUR HES CXL technology demonstrator: composable memory

Lead Research Organisation: Durham University
Department Name: Physics

Abstract

Compute Express Link (CXL) is a new standard for connectivity between CPUs and other components, and is largely expected to replace PCIe in the future. The most novel new feature is support for cache coherent protocols for accessing system and device memory which will open up new capabilities, new programming models, and allow heterogeneous codes to operate more efficiently, and with greater simplicity, both from programmer and execution point of view.
CXL promises to be the dominant standard for years to come, having recently accepted transfer of assets from both the Gen-Z and OpenCAPI consortiums. CXL 1.1 will be supported by the forthcoming Intel Sapphire Rapids and AMD Genoa CPUs. All major manufacturers have signed up to adopt CXL.
This proposal will introduce some of the technologies that will eventually be made available by the CXL v3 standard which will eventually (probably 2025) introduce full composable memory and cache coherent memory sharing between remote servers. 2023 will see first release of servers supporting the CXL v1.1 standard.
As a precursor for this, to allow code development in readiness for full composable memory, we will procure a Liqid composable RAM system, as an extension to our exisiting Liqid composable infrastructure, and install it on nodes of the COSMA HPC system at Durham. This will provide a central resource of 12TB RAM, which can then programatically be shared out to two login nodes, and two high memory nodes as required, to allow processing of huge datasets. We already have a Liqid composable disaggregated infrastructure system at Durham, allowing us to compose GPUs to servers on demand.

Publications

10 25 50
 
Description Composable infrastructure including accelerator and RAM is now a reality allowing compute clusters to be specified upon demand
Exploitation Route The composable infrastructure is now in production and accessible to UK researchers
Sectors Digital/Communication/Information Technologies (including Software)

 
Description Composable infrastructure is now a reality and we have used the experience gained here to move to the next stage of composability for large HPC systems.
First Year Of Impact 2023
Sector Digital/Communication/Information Technologies (including Software)
 
Description DiRAC-3 Operations 2023-26 - Durham
Amount £1,264,938 (GBP)
Funding ID ST/X000265/1 
Organisation Science and Technologies Facilities Council (STFC) 
Sector Public
Country United Kingdom
Start 03/2023 
End 03/2026
 
Title COSMA Composable infrastructure 
Description Access to composable infrastructure for UK researchers 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2023 
Impact Ability to access large RAM systems in a dynamic fashion