Asynchronous Scientific Continuous Computations Exploiting Disaggregation (ASCCED)
Lead Research Organisation:
Queen's University Belfast
Department Name: Sch of Electronics, Elec Eng & Comp Sci
Abstract
The design of efficient and scalable scientific simulation software is reaching a critical point whereby continued advances are increasingly harder, more labour-intensive, and thus more expensive to achieve. This challenge emanates from the constantly evolving design of large-scale high-performance computing systems. World-leading (pre-)exascale systems, as well as their successors, are characterised by multi-million-scale parallel computing activities and a highly heterogeneous mix of processor types such as high-end many-core processors, Graphics Processing Units (GPU), machine learning accelerators, and various accelerators for compression, encryption and in-network processing. To make efficient use of these systems, scientific simulation software must be decomposed in various independent components and make simultaneous use of the variety of heterogeneous compute units.
Developing efficient, scalable scientific simulation software for these systems becomes increasingly harder as the limits of parallelism available in the simulation codes is approached. Moreover, the limit of parallelism cannot be reached in practice due to heterogeneity, system imbalances and synchronisation overheads. Scientific simulation software often persists over several decades. The software is optimised and re-optimised repeatedly as the design and scale of the target hardware evolves at a much faster pace, as impactful changes in the hardware may occur every few years. One may thus find that the guiding principles that underpin such software are outdated.
The ASCCED project will fundamentally change the status quo in the design of scientific simulation software by simplifying the design to reduce software development and maintenance effort, to facilitate performance optimisation, and to make software more robust to future evolution of computing hardware. The key distinguishing factor of our approach is to structure scientific simulation software as a collection of loosely coupled parallel activities. We will explore the opportunities and challenges of applying techniques previously developed for Parallel Discrete Event Simulation (PDES) to orchestrate these loosely coupled parallel activities. This radically novel approach will enable runtime system software to extract unprecedented scales of parallelism and to minimise performance inefficiencies due to synchronisation. Additionally, based on a speculative execution mechanism, it will uncover parallelism that has not been feasible to extract before.
The computational model proposed by ASCCED will, if successful, initiate a new direction of research within programming models for high-performance computing that may dramatically impact not only the performance of scientific simulation software, but can also reduce the engineering effort required to produce efficient scientific simulation software. It will have a profound impact on the sciences that are highly dependent on leadership computing capabilities, such as climate modeling and cancer research.
Developing efficient, scalable scientific simulation software for these systems becomes increasingly harder as the limits of parallelism available in the simulation codes is approached. Moreover, the limit of parallelism cannot be reached in practice due to heterogeneity, system imbalances and synchronisation overheads. Scientific simulation software often persists over several decades. The software is optimised and re-optimised repeatedly as the design and scale of the target hardware evolves at a much faster pace, as impactful changes in the hardware may occur every few years. One may thus find that the guiding principles that underpin such software are outdated.
The ASCCED project will fundamentally change the status quo in the design of scientific simulation software by simplifying the design to reduce software development and maintenance effort, to facilitate performance optimisation, and to make software more robust to future evolution of computing hardware. The key distinguishing factor of our approach is to structure scientific simulation software as a collection of loosely coupled parallel activities. We will explore the opportunities and challenges of applying techniques previously developed for Parallel Discrete Event Simulation (PDES) to orchestrate these loosely coupled parallel activities. This radically novel approach will enable runtime system software to extract unprecedented scales of parallelism and to minimise performance inefficiencies due to synchronisation. Additionally, based on a speculative execution mechanism, it will uncover parallelism that has not been feasible to extract before.
The computational model proposed by ASCCED will, if successful, initiate a new direction of research within programming models for high-performance computing that may dramatically impact not only the performance of scientific simulation software, but can also reduce the engineering effort required to produce efficient scientific simulation software. It will have a profound impact on the sciences that are highly dependent on leadership computing capabilities, such as climate modeling and cancer research.
Organisations
Publications
Vandierendonck H
(2024)
Differentiating Set Intersections in Maximal Clique Enumeration by Function and Subproblem Size
| Description | High-performance computing systems are built up of thousands of independent computers, which are connected through a network. By careful synchronisation between the activities of these computers, we can program them to jointly work on a complex task, allowing to compute on problems that are much larger than what a single computer can handle. Synchronisation between computers is typically strongly structured and rigid, which implies high overheads for synchronisation and substantial idle time for all computers to wait until the last one arrives at a synchronisation point. We are exploring how to relax that synchronisation and allow a more fluent and independent progress of computation between the different computers. This leads to improved performance, reduced energy, or a reduced number of computers to solve the same problem. Our findings so far include the design of a synchronisation scheme for molecular dynamics problems, where we have identified what computations need to be strictly synchronised and which can be loosely synchronised. We have identified the inherent mechanisms for self-correction in this scheme to minimise differences in progress between computers and analysed the key points where the scheme may reduce accuracy of the computation. Our proposed solution allows a small amount of speculative execution of up to two timesteps, where missing values are replaced by interpolated values. The interpolation step is local to the processes and does not involve communication, which ensures its efficiency and scalability. We have carefully studied the accuracy of the method and obtained good to excellent results on exemplar ODE-based problems in molecular dynamics. We evaluated the scalability of the approach using weak scaling and strong scaling up to 128 nodes on the ARCHER-2 system. |
| Exploitation Route | We will release our modified LAMMPS codebase in open source to the general public in accordance with the LAMMPS license. We will share our findings with the LAMMPS developers and explore uptake of our proposed techniques in the main source code. Through this route, we can achieve higher performance, reduced energy or reduced hardware requirements for all users of LAMMPS. Note that LAMMPS is one of the most popular open source code bases for molecular dynamics simulations, used by academia and industry, and one of the most often used codes on the ARCHER Tier-1 high-performance computer. |
| Sectors | Aerospace Defence and Marine Chemicals Digital/Communication/Information Technologies (including Software) |
| Description | PaCoBi: Scaling Parallelism and Convexity Hurdles in Bi-Level Machine Learning |
| Amount | £206,086 (GBP) |
| Funding ID | EP/Z001110/1 |
| Organisation | United Kingdom Research and Innovation |
| Sector | Public |
| Country | United Kingdom |
| Start | 09/2024 |
| End | 09/2026 |
| Title | Visualize-it: View scientific concepts at work. Anytime. Anywhere (Contribution to existing project) |
| Description | This website serves to provide an interactive learning experience for topics in the fields of physics, mathematics, computer science and complex-systems. Using expertise accumulated during this project, Dr Dandurand pushed some corrections/improvements to the planetary motion application of Visualize-It via the following fork: https://github.com/bcdandurand/visualize-it.github.io, which has not been adopted into the mainline. |
| Type Of Technology | Software |
| Year Produced | 2025 |
| Open Source License? | Yes |
| Impact | Critical corrections to the planetory motion simulation have been added using the Verlet integration approach that underpins our work on ASCCED. The prior version of this software used a different, less accurate integration approach. The software is designed for educational activities in general. We have used it as a demo at the Northern Ireland Science Festival 2025 to explain core ideas of compute simulation of physical processes to the general audience. |
| URL | https://github.com/bcdandurand/visualize-it.github.io |
| Description | NI Science Festival exhibition "AI and the Computing that Drives It" |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Public/other audiences |
| Results and Impact | Even description: "Artificial Intelligence is everywhere. We find it in algorithms prioritising your social media feed, Chat-GPT revising your writing, or your satnav taking you where you need to be. But how does it work? In this session, we will explore some of the principles behind AI, and some of the ways it can fail. We will also explore the computing infrastructure that underpins AI and its hunger for high-end computing and extensive energy consumption." We had about 40-50 visitors from across the general audience, ranging from school-going youth to pensioners. There were some practitioners as well as colleagues from different faculties at Queens University Belfast. The primary outcomes were raising awareness, and changes in opinion in the audience. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://nisciencefestival.com/events/artificial-intelligence-and-the-computing-that-drives-it |
