The UKRI ExCALIBUR Hardware And Enabling Software Programme: The UCL Adaptable Cluster Project (PERIOD ONE)

Lead Research Organisation: University College London
Department Name: Computer Science

Abstract

In 2018, the Exascale Computing ALgorithms & Infrastructures for the Benefit of UK Research (ExCALIBUR) programme was proposed by the Met Office, CCFE and EPSRC (on behalf of UKRI). The goal of ExCALIBUR is to redesign high priority computer codes and algorithms, keeping UK research and development at the forefront of high-performance simulation science. The challenge spans many disciplines and as such the programme of research will be delivered through a partnership between the Met Office and UKRI Research Councils. Research software engineers and scientists will work together to future proof the UK against the fast-moving changes in supercomputer designs. This combined scientific expertise will push the boundaries of science across a wide range of fields delivering transformational change at the cutting-edge of scientific supercomputing. DiRAC proposed the inclusion in the ExCALIBUR business case of a request for £4.5M in capital funding over 4.5 years to develop a hardware forsighting programme. Industry co-funding for the programme will be sought where possible. The £4.5m capital is intended to provide a testbed area that uses pre-commercial equipment for software prototyping and development. It has two main purposes: (1) to enable the software community to be ready to use commercial products effectively as soon as they come on to the market; and (2) to provide the UKRI HPC community with the ability to influence industry and the necessary knowledge to guide their purchase decisions. This will ensure that facilities and the future UK National e-Infrastructure are in a position to maximise value for money by getting the most powerful systems exactly suited to the communities' needs. This double-pronged approach will give UK researchers a competitive advantage internationally. ExCALIBUR will now establish a set of modest-sized, adaptable clusters dedicated solely to this purpose and embedded within established HPC environments. Although small, they need to be of a scale capable of carrying out meaningful performance studies. They are expected to be co-funded with industry partners and will initially require investments of 200k-£300k each, and will allow a range of future hardware to be assessed for its relevance to the delivery of UKRI science and innovation. The pre-commercial equipment will be refreshed and added to on a regular, likely to be annual, basis. This agile tactic is designed to take advantage of the different approaches across industry (some companies, e.g. NVidia tend to have a short (less than 3-month) pre-commercial window while for others this can be up to a year). ExCALIBUR can use the hardware piloting systems to drive software innovation across the UKRI research community. Researchers are rightly reluctant to invest time in code development to take advantage of new hardware which may not be available at scale for several years or may even prove not to have longevity - scientific leadership demands that research funding is used to deliver science results now. In additoin and DiRAC and others will offer funded RSE effort to support the development work combined with access to novel technologies within modest-sized systems, Excalibur can lower the bar for engaging with the process of software re-engineering and encourage researchers to make the necessary (modest) investments of their time. In some cases, there may also be the potential for some immediate science outputs by exploiting the proof-of-concept systems. Excalibur will thus be able to provide an incentive for greater software innovation across the UKRI research communities and help to ensure that when novel technology is included in national services, there are workflows that are already able to exploit it optimally. This will increase productivity across all UKRI computing services and enable UK researchers to use the latest hardware to deliver the largest and most complex calculations, ensuring international leadership.

Planned Impact

The anticipated impact of ExCALIBUR Capital Programme aligns closely with the recently published UKRI Infrastructure Roadmap and the UK's Industrial Strategy. As such, many of our key impacts will be driven by our engagements with industry, both technology providers and those who use such technologies. Each ExCALIBUR partner will have strong industrial partnership and industrial strategy to deliver their project outcomes.

The "Pathways to Impact" document which is attached to this proposal describes the overall industrial strategy for the ExCALIBUR Programme, including our strategic goals and key performance indicators.

Publications

10 25 50
publication icon
Ceuster F (2022) 3D Line Radiative Transfer & Synthetic Observations with Magritte in Journal of Open Source Software

 
Title Monitoring application perfromance on hardware 
Description These tool is in under development, but it is designed to aid the design of new systems by monitoring at the component level the performance of benchmark applications for a particular configuration and/or simulation 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? No  
Impact It has sparked interest in the area of "benchmarking systems that don't exist yet", which is central to thinking about the design of the UK's possible exascale programme 
 
Description Partnership with Nvidia Networks for creating adaptable clusters 
Organisation Hewlett Packard Enterprise (HPE)
Country United Kingdom 
Sector Private 
PI Contribution System Design and research software engineering resources
Collaborator Contribution Hardware, particularly bluefield cards to process data in-flight
Impact Preliminary work has looked at using bluefield cards as networking moniroing devices to measure resource movement and requirement to alleviate the load balancing problem - so adapting the cluster to the resource requirements. Interviews with leading Exascale software development groups in the ExCALIBUR programme have taken place and these are being developed into a benchmarking strategy. Multi-disciplinary - Computer Science and Research Computing
Start Year 2020
 
Description Partnership with Nvidia Networks for creating adaptable clusters 
Organisation NVIDIA
Country Global 
Sector Private 
PI Contribution System Design and research software engineering resources
Collaborator Contribution Hardware, particularly bluefield cards to process data in-flight
Impact Preliminary work has looked at using bluefield cards as networking moniroing devices to measure resource movement and requirement to alleviate the load balancing problem - so adapting the cluster to the resource requirements. Interviews with leading Exascale software development groups in the ExCALIBUR programme have taken place and these are being developed into a benchmarking strategy. Multi-disciplinary - Computer Science and Research Computing
Start Year 2020
 
Description membership of the ExCALIBUR Hardware and Enabling Software Technical Working Group 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Increased understanding of how data can be interrogated and modified in -flight by the network. This has been incorporated into several ExCALIBUR cross-cutting themese proposals.
Year(s) Of Engagement Activity 2020,2021