ExCALIBUR HES CXL technology demonstrator: composable memory
Lead Research Organisation:
Durham University
Department Name: Physics
Abstract
Compute Express Link (CXL) is a new standard for connectivity between CPUs and other components, and is largely expected to replace PCIe in the future. The most novel new feature is support for cache coherent protocols for accessing system and device memory which will open up new capabilities, new programming models, and allow heterogeneous codes to operate more efficiently, and with greater simplicity, both from programmer and execution point of view.
CXL promises to be the dominant standard for years to come, having recently accepted transfer of assets from both the Gen-Z and OpenCAPI consortiums. CXL 1.1 will be supported by the forthcoming Intel Sapphire Rapids and AMD Genoa CPUs. All major manufacturers have signed up to adopt CXL.
This proposal will introduce some of the technologies that will eventually be made available by the CXL v3 standard which will eventually (probably 2025) introduce full composable memory and cache coherent memory sharing between remote servers. 2023 will see first release of servers supporting the CXL v1.1 standard.
As a precursor for this, to allow code development in readiness for full composable memory, we will procure a Liqid composable RAM system, as an extension to our exisiting Liqid composable infrastructure, and install it on nodes of the COSMA HPC system at Durham. This will provide a central resource of 12TB RAM, which can then programatically be shared out to two login nodes, and two high memory nodes as required, to allow processing of huge datasets. We already have a Liqid composable disaggregated infrastructure system at Durham, allowing us to compose GPUs to servers on demand.
CXL promises to be the dominant standard for years to come, having recently accepted transfer of assets from both the Gen-Z and OpenCAPI consortiums. CXL 1.1 will be supported by the forthcoming Intel Sapphire Rapids and AMD Genoa CPUs. All major manufacturers have signed up to adopt CXL.
This proposal will introduce some of the technologies that will eventually be made available by the CXL v3 standard which will eventually (probably 2025) introduce full composable memory and cache coherent memory sharing between remote servers. 2023 will see first release of servers supporting the CXL v1.1 standard.
As a precursor for this, to allow code development in readiness for full composable memory, we will procure a Liqid composable RAM system, as an extension to our exisiting Liqid composable infrastructure, and install it on nodes of the COSMA HPC system at Durham. This will provide a central resource of 12TB RAM, which can then programatically be shared out to two login nodes, and two high memory nodes as required, to allow processing of huge datasets. We already have a Liqid composable disaggregated infrastructure system at Durham, allowing us to compose GPUs to servers on demand.
Organisations
People |
ORCID iD |
Alastair Basden (Principal Investigator) |
Description | Composable infrastructure including accelerator and RAM is now a reality allowing compute clusters to be specified upon demand |
Exploitation Route | The composable infrastructure is now in production and accessible to UK researchers |
Sectors | Digital/Communication/Information Technologies (including Software) |
Description | Composable infrastructure is now a reality and we have used the experience gained here to move to the next stage of composability for large HPC systems. |
First Year Of Impact | 2023 |
Sector | Digital/Communication/Information Technologies (including Software) |
Description | DiRAC-3 Operations 2023-26 - Durham |
Amount | £1,264,938 (GBP) |
Funding ID | ST/X000265/1 |
Organisation | Science and Technologies Facilities Council (STFC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2023 |
End | 03/2026 |
Title | COSMA Composable infrastructure |
Description | Access to composable infrastructure for UK researchers |
Type Of Technology | New/Improved Technique/Technology |
Year Produced | 2023 |
Impact | Ability to access large RAM systems in a dynamic fashion |