ExCALIBUR HES Exploring Coarse Grained Reconfigurable Architectures (CGRAs)

Lead Research Organisation: University of Edinburgh
Department Name: Edinburgh Parallel Computing Centre

Abstract

In 2018, the Exascale Computing ALgorithms & Infrastructures for the Benefit of UK Research (ExCALIBUR) programme was proposed by the Met Office, CCFE and EPSRC (on behalf of UKRI). The goal of ExCALIBUR is to redesign high priority computer codes and algorithms, keeping UK research and development at the forefront of high-performance simulation science. The challenge spans many disciplines and as such the programme of research will be delivered through a partnership between the Met Office and UKRI Research Councils. Research software engineers and scientists will work together to future proof the UK against the fast-moving changes in supercomputer designs. This combined scientific expertise will push the boundaries of science across a wide range of fields delivering transformational change at the cutting-edge of scientific supercomputing. DiRAC proposed the inclusion in the ExCALIBUR business case of a request for £4.5M in capital funding over 4.5 years to develop a hardware fore-sighting programme. Industry co-funding for the programme will be sought where possible.

The £4.5m capital is intended to provide a testbed area that uses pre-commercial equipment for software prototyping and development. It has two main purposes: (1) to enable the software community to be ready to use commercial products effectively as soon as they come on to the market; and (2) to provide the UKRI HPC community with the ability to influence industry and the necessary knowledge to guide their purchase decisions. This will ensure that facilities and the future UK National e-Infrastructure are in a position to maximise value for money by getting the most powerful systems exactly suited to the communities' needs. This double-pronged approach will give UK researchers a competitive advantage internationally.

ExCALIBUR will now establish a set of modest-sized, adaptable clusters dedicated solely to this purpose and embedded within established HPC environments. Although small, they need to be of a scale capable of carrying out meaningful performance studies. They are expected to be co-funded with industry partners and will initially require investments of £200k-£300k each, and will allow a range of future hardware to be assessed for its relevance to the delivery of UKRI science and innovation. The pre-commercial equipment will be refreshed and added to on a regular, likely to be annual, basis. This agile tactic is designed to take advantage of the different approaches across industry (some companies, e.g. NVidia tend to have a short (less than 3-month) pre-commercial window while for others this can be up to a year).

ExCALIBUR can use the hardware piloting systems to drive software innovation across the UKRI research community. Researchers are rightly reluctant to invest time in code development to take advantage of new hardware which may not be available at scale for several years or may even prove not to have longevity - scientific leadership demands that research funding is used to deliver science results now. In addition and DiRAC and others will offer funded RSE effort to support the development work combined with access to novel technologies within modest-sized systems, Excalibur can lower the bar for engaging with the process of software re-engineering and encourage researchers to make the necessary (modest) investments of their time. In some cases, there may also be the potential for some immediate science outputs by exploiting the proof-of-concept systems.

Excalibur will thus be able to provide an incentive for greater software innovation across the UKRI research communities and help to ensure that when novel technology is included in national services, there are workflows that are already able to exploit it optimally. This will increase productivity across all UKRI computing services and enable UK researchers to use the latest hardware to deliver the largest and most complex calculations, ensuring international leadership.
 
Description We have developed the first case study of using AMD AI Engines for accelerating HPC applications, such as weather models, and also further enhanced community knowledge around how we put together HPC codes for the Cerebras CS-2 system. We have developed compiler tooling for the AIEs, and made tutorials more accessible to the general computing audience.
Exploitation Route The compiler tooling for AIEs is especially interesting we feel. This is timely given that AMD are now releasing AIEs as part of their CPUs, and this significant improves programmability of that technology.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description It is still early, but people have adopted the techniques we developed for the systems and the compiler technology is being explored
First Year Of Impact 2024
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description RSE personal research fellowship
Amount £63,000 (GBP)
Funding ID 3271 
Organisation Royal Society of Edinburgh (RSE) 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2024 
End 02/2025
 
Title CS-2 for HPC tutorial content 
Description This is a self contained tutorial, that people can follow on their own, to program the CS-2 for running HPC applications 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact We have proposed a tutorial at ISC24 (Europe's largest HPC conference and currently awaiting outcome of submission), and will be giving the tutorial at HPC Days in Durham in May 2024. 
URL https://github.com/EPCCed/cs2-sdk-training
 
Title Pythonise AMD MLIR tutorials 
Description We developed Jupyter notebooks of the MLIR AIE toolchain tutorials using the work done in the xDSL ExCALIBUR project (connecting and collaborating with them which is an important part of ExCALIBUR) and the port of the AIE MLIR dialect done in this project. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact These have been made available and have demoed these to AMD 
URL https://github.com/xdslproject/mlir-aie/tree/xdsl_tutorials/tutorials