SERT: Scale-free, Energy-aware, Resilient and Transparent Adaptation of CSE Applications to Mega-core Systems
Lead Research Organisation:
Queen's University of Belfast
Department Name: Sch of Electronics, Elec Eng & Comp Sci
Abstract
Moore's Law and Dennard scaling have led to dramatic performance increases in microprocessors, the basis of modern supercomputers, which consist of clusters of nodes that include microprocessors and memory. This design is deeply embedded in parallel programming languages, the runtime systems that orchestrate parallel execution, and computational science applications.
Some deviations from this simple, symmetric design have occurred over the years, but now we have pushed transistor scaling to the extent that simplicity is giving way to complex architectures. The absence of Dennard scaling, which has not held for about a decade, and the atomic dimensions of transistors have profound implications on the architecture of current and future supercomputers.
Scalability limitations will arise from insufficient data access locality. Exascale systems will have up to 100x more cores and commensurately less memory space and bandwidth per core. However, in-situ data analysis, motivated by decreasing file system bandwidths will increase the memory footprints of scientific applications. Thus, we must improve per-core data access locality and reduce contention and interference for shared resources.
Energy constraints will fundamentally limit the performance and reliability of future large-scale systems. These constraints lead many to predict a phenomenon of "dark silicon" in which half or more of the transistors on each chip must be powered down for safe operation. Low-power processor technologies based on sub-threshold or near-threshold voltage operation are a viable alternative. However, these techniques dramatically decrease the mean time to failure at scale and, thus, require new paradigms to sustain throughput and correctness.
Non-deterministic performance variation will arise from design process variation that leads to asymmetric performance and power consumption in architecturally symmetric hardware components. The manifestations of the asymmetries are non-deterministic and can vary with small changes to system components or software. This performance variation produces non-deterministic, non-algorithmic load imbalance.
Reliability limitations will stem from the massive number of system components, which proportionally reduces the mean-time-to-failure, but also from the component wear and from low-voltage operation, which introduces timing errors. Infrastructure-level power capping may also compromise application reliability or create severe load imbalances.
The impact of these changes on technology will travel as a shockwave throughout the software stack. For decades, we have designed computational science applications based on very strict assumptions that performance is uniform and processors are reliable. In the future, hardware will behave unpredictably, at times erratically. Software must compensate for this behavior.
Our research anticipates this future hardware landscape. Our ecosystem will combine binary adaptation, code refactoring, and approximate computation to prepare CSE applications. We will provide them with scale-freedom - the ability to run well at scale under dynamic execution conditions - with at most limited, platform-agnostic code refactoring. Our software will provide automatic load balancing and concurrency throttling to tame non-deterministic performance variations. Finally, our new form of user-controlled approximate computation will enable execution of CSE applications on hardware with low supply voltages, or any form of faulty hardware, by selectively dropping or tolerating erroneous computation that arises from unreliable execution, thus saving energy. Cumulatively, these tools will enable non-intrusive reengineering of major computational science libraries and applications (2DRMP, Code_Saturne, DL_POLY, LB3D) and prepare them for the next generation of UK supercomputers. The project partners with NAG a leading UK HPC software and service provider.
Some deviations from this simple, symmetric design have occurred over the years, but now we have pushed transistor scaling to the extent that simplicity is giving way to complex architectures. The absence of Dennard scaling, which has not held for about a decade, and the atomic dimensions of transistors have profound implications on the architecture of current and future supercomputers.
Scalability limitations will arise from insufficient data access locality. Exascale systems will have up to 100x more cores and commensurately less memory space and bandwidth per core. However, in-situ data analysis, motivated by decreasing file system bandwidths will increase the memory footprints of scientific applications. Thus, we must improve per-core data access locality and reduce contention and interference for shared resources.
Energy constraints will fundamentally limit the performance and reliability of future large-scale systems. These constraints lead many to predict a phenomenon of "dark silicon" in which half or more of the transistors on each chip must be powered down for safe operation. Low-power processor technologies based on sub-threshold or near-threshold voltage operation are a viable alternative. However, these techniques dramatically decrease the mean time to failure at scale and, thus, require new paradigms to sustain throughput and correctness.
Non-deterministic performance variation will arise from design process variation that leads to asymmetric performance and power consumption in architecturally symmetric hardware components. The manifestations of the asymmetries are non-deterministic and can vary with small changes to system components or software. This performance variation produces non-deterministic, non-algorithmic load imbalance.
Reliability limitations will stem from the massive number of system components, which proportionally reduces the mean-time-to-failure, but also from the component wear and from low-voltage operation, which introduces timing errors. Infrastructure-level power capping may also compromise application reliability or create severe load imbalances.
The impact of these changes on technology will travel as a shockwave throughout the software stack. For decades, we have designed computational science applications based on very strict assumptions that performance is uniform and processors are reliable. In the future, hardware will behave unpredictably, at times erratically. Software must compensate for this behavior.
Our research anticipates this future hardware landscape. Our ecosystem will combine binary adaptation, code refactoring, and approximate computation to prepare CSE applications. We will provide them with scale-freedom - the ability to run well at scale under dynamic execution conditions - with at most limited, platform-agnostic code refactoring. Our software will provide automatic load balancing and concurrency throttling to tame non-deterministic performance variations. Finally, our new form of user-controlled approximate computation will enable execution of CSE applications on hardware with low supply voltages, or any form of faulty hardware, by selectively dropping or tolerating erroneous computation that arises from unreliable execution, thus saving energy. Cumulatively, these tools will enable non-intrusive reengineering of major computational science libraries and applications (2DRMP, Code_Saturne, DL_POLY, LB3D) and prepare them for the next generation of UK supercomputers. The project partners with NAG a leading UK HPC software and service provider.
Planned Impact
The project will achieve commercial impact through the development of production-level Computational Science and Engineering Software that will catalyse performance and productivity in applications within the EPSRC remit; industrial engagement with UK and international stakeholders, in particular through membership of project partners in the European Technology Platform for HPC (ETP4HPC); exploration of the potential to receive follow-on funding and create spin-out companies with instruments such as the Impact Account Acceleration at Queen's Belfast; and the organisation of an industrial workshop. The project will achieve further economic impact through better utilisation and reduction of the total cost of ownership of the major UK supercomputing infrastructures and improved productivity in sectors of the UK high-technology economy that depend on HPC.
The project will achieve academic impact by publishing results in the very best journals and conferences across the areas of high performance computing, computational science, scientific computing, programming languages and computer architecture. All publications will follow Green or Gold open access routes, the former leveraging institutional publication repositories and the latter institutional funding. All software developed in the project will be open-sourced, with associated training provision in the form of tutorials and short modules. Further academic impact will be achieved via exchange visits and demonstration sessions with project partner NAG, ClusterVision, and other HPC vendors and groups in the UK.
Societal impact will be achieved through prominent presence in social media (Web 2.0, LinkedIn, Twitter and YouTube Channels) to disseminate the results to professionals and the general public. Further societal presence will be achieved through distribution of news articles, press releases, and video presentations. The project will develop software technologies for emerging many-core systems, a skill which is highly marketable.
The project follows a comprehensive software management plan: It will produce three software outputs (Adaptor, RightSizer, Approximator), licensed under GPL. The tools will be developed, tested and maintained in a GITlab software repository, with the associated GIT revision control system hosted by Queen's Belfast and shared between the project partners. The software will be user-level and will not require interventions to the host operating system, which would prevent its deployment on the target systems (ARCHER, BlueJoule, NextScale, Titan). It will be based on the GNU stack for maximum portability across current and future platforms. The software will support and be compatible with widely used parallel programming languages (MPI, OpenMP, OpenCL) and libraries (MAGMA, PLASMA, ATLAS). Source code changes in MPI, OpenMP and OpenCL, where needed, will be feasible with the adoption of open-source implementations of them (e.g. OpenMPI, PoCL, GOMP).
The software will be released to and hosted for the public by Queen's Belfast during the course of the project, and later by STFC for production use on the targeted supercomputers. The GITlab repository that will house the software at QUB is well tested and already provides support for code development, maintenance, revision control and testing in nine large-scale software development projects (EPSRC, FP7/H2020, and industry-lead), involving 28 research groups in the UK, Germany, Switzerland, Sweden, Greece, Austria, Ireland and the US, and totalling hundreds of KLOC in C/C++ parallel code. We will use Doxygen for formal code documentation, DokuWiki for informal documentation and discussion among developers, and BugZilla for bug tracking. We will use nightly builds and regression tests. A permanent research engineer funded from Queen's will undertake the role of software maintenance and quality control manager and will be responsible for maintaining the highest coding and documentation
The project will achieve academic impact by publishing results in the very best journals and conferences across the areas of high performance computing, computational science, scientific computing, programming languages and computer architecture. All publications will follow Green or Gold open access routes, the former leveraging institutional publication repositories and the latter institutional funding. All software developed in the project will be open-sourced, with associated training provision in the form of tutorials and short modules. Further academic impact will be achieved via exchange visits and demonstration sessions with project partner NAG, ClusterVision, and other HPC vendors and groups in the UK.
Societal impact will be achieved through prominent presence in social media (Web 2.0, LinkedIn, Twitter and YouTube Channels) to disseminate the results to professionals and the general public. Further societal presence will be achieved through distribution of news articles, press releases, and video presentations. The project will develop software technologies for emerging many-core systems, a skill which is highly marketable.
The project follows a comprehensive software management plan: It will produce three software outputs (Adaptor, RightSizer, Approximator), licensed under GPL. The tools will be developed, tested and maintained in a GITlab software repository, with the associated GIT revision control system hosted by Queen's Belfast and shared between the project partners. The software will be user-level and will not require interventions to the host operating system, which would prevent its deployment on the target systems (ARCHER, BlueJoule, NextScale, Titan). It will be based on the GNU stack for maximum portability across current and future platforms. The software will support and be compatible with widely used parallel programming languages (MPI, OpenMP, OpenCL) and libraries (MAGMA, PLASMA, ATLAS). Source code changes in MPI, OpenMP and OpenCL, where needed, will be feasible with the adoption of open-source implementations of them (e.g. OpenMPI, PoCL, GOMP).
The software will be released to and hosted for the public by Queen's Belfast during the course of the project, and later by STFC for production use on the targeted supercomputers. The GITlab repository that will house the software at QUB is well tested and already provides support for code development, maintenance, revision control and testing in nine large-scale software development projects (EPSRC, FP7/H2020, and industry-lead), involving 28 research groups in the UK, Germany, Switzerland, Sweden, Greece, Austria, Ireland and the US, and totalling hundreds of KLOC in C/C++ parallel code. We will use Doxygen for formal code documentation, DokuWiki for informal documentation and discussion among developers, and BugZilla for bug tracking. We will use nightly builds and regression tests. A permanent research engineer funded from Queen's will undertake the role of software maintenance and quality control manager and will be responsible for maintaining the highest coding and documentation
Publications


Arif M
(2016)
A scalable and composable map-reduce system

Chalios C
(2017)
DARE Data-Access Aware Refresh via spatial-temporal application resilience on commodity servers
in The International Journal of High Performance Computing Applications



Dongarra J
(2019)
PLASMA Parallel Linear Algebra Software for Multicore Using OpenMP
in ACM Transactions on Mathematical Software

Georgakoudis G
(2017)
SCALO Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads
in ACM Transactions on Architecture and Code Optimization

Georgakoudis G
(2017)
REFINE

Guo X
(2018)
New massively parallel scheme for Incompressible Smoothed Particle Hydrodynamics (ISPH) for highly nonlinear and distorted flow
in Computer Physics Communications
Description | In the context of SERT we developed REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrated our approach in 14 HPC programs and showed that, due to our unique design, its runtime overhead is significantly smaller than state-of-the-art compiler-based FI frameworks, reducing the time for large FI experiments.We also developed a significant codebase of tools for improving the scaling of parallel programs. Specifically, we developed SCALO, a tool that increases throughput of jobs on supercomputer nodes. SCALO optimizes resource allocation of parallel programs running concurrently on the same node, by minimising contention and adapting resource allocation to the scalability potential of co-runners. A particular strength of our tool is that it can be deployed to existing supercomputer infrastructure, without disrupting the pre-deployed installations. The initial evaluation using benchmarks of HPC application proxies show promising results and we are expanding its use to large-scale application. Approximation is an emerging research method for speeding up execution by trading computational accuracy for performance. We choose to extend the widely used OpenMP parallel language for including constructs to express approximation opportunities on parallel computations. We develop those extensions on already existing, industrial quality tools, including a compiler (Clang/LLVM) and parallel runtime (Intel OpenMP runtime). Through our extensions, HPC developers have a structured way to include approximation in parallel programs and dictate how this is implemented at the execution runtime. For example, the developer annotates computational tasks as amenable to approximation and configures the runtime to perform those computations with reduced accuracy or even completely drop them for aggressive speed optimisation. We have demonstrated the applicability of our approximation techniques in numerical kernels and we are in the process of evaluating them to on large-scale applications. |
Exploitation Route | We make available all the tools we develop to our research partners which are HPC application and numerical library developers. Also, we intent to release our software to the wider scientific community while fine-tuning it for usability and performance, using invaluable feedback from our partners and domain experts. The vision for our SCALO tool is to be part of the system services provided by supercomputer facilities. Its usage will enable users to co-locate jobs on nodes for increasing utilisation and throughput of supercomputer installations. Our approximation framework presents a robust, ready-to-use solution by extending existing standards (OpenMP) used for programming HPC applications. This enables developers to include approximation in their applications. |
Sectors | Digital/Communication/Information Technologies (including Software) |
Description | The findings of the project are actively been used to inform software engineering practices and improve software productivity as well as resilience of production strength software in two supercomputing centres in the UK (STFC) and the US (LLNL). |
First Year Of Impact | 2017 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Description | EU Horizon2020 Programme: AllScale: An Exascale Programming, Multi-objective Optimisation and Resilience Management Environment Based on Nested Recursive Parallelism. |
Amount | € 438,578 (EUR) |
Funding ID | 671603 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 09/2015 |
End | 09/2018 |
Description | EU Horizon2020 Programme: ECOSCALE: Energy-Efficient Heterogeneous Computing at Scale |
Amount | € 696,750 (EUR) |
Funding ID | 671632 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 09/2015 |
End | 09/2018 |
Description | EU Horizon2020 Programme: UniServer Project |
Amount | € 663,625 (EUR) |
Funding ID | 687628 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 02/2016 |
End | 01/2019 |
Description | Horizon2020 Programme |
Amount | € 5,999,510 (EUR) |
Funding ID | H2020-732631 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2017 |
End | 12/2020 |
Description | Royal Society Wolfson Research Merit Award: Principles and Practice of Near-Data Computing |
Amount | £50,000 (GBP) |
Funding ID | WM150009 |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2015 |
End | 08/2020 |
Description | SFI-DEL Investigators Programme: Meeting the Challenges of Heterogeneous and Extreme Scale Parallel Computing |
Amount | £521,947 (GBP) |
Funding ID | 14/IA/2474 |
Organisation | Science Foundation Ireland (SFI) |
Sector | Charity/Non Profit |
Country | Ireland |
Start | 08/2015 |
End | 08/2020 |
Description | Collaboration with IBM on disaggregated memory technologies and near-data computing |
Organisation | IBM |
Country | United States |
Sector | Private |
PI Contribution | Our research team has contributed methods to manage data caching and placement on disaggregated memory architectures with near-data processing elements. |
Collaborator Contribution | IBM has contributed novel remote memory server infrastructures and near-data acceleration technologies. |
Impact | Materialised through an industrial placement of QUB research staff, this partnership is exploring designs to substantially improve the energy-efficiency of large memory systems, via the use of disaggregation of memory, RDMA-based networking to remote memory devices and near-data accelerators for in-situ, in-memory analytics. |
Start Year | 2015 |
Description | Collaboration with Maxeler on integrating dataflow accelerators in Big Data software stacks |
Organisation | Maxeler Technologies Inc |
Department | Maxeler Technologies |
Country | United Kingdom |
Sector | Private |
PI Contribution | Integration of Maxeler's dataflow engines into the Spark, Storm and other Big Data software stacks, in collaboration with Maxeler Technologies and STFC Hartree. |
Collaborator Contribution | Programming APIs for Maxeler dataflow accelerators. |
Impact | No outputs yet, extensions of Spark and Storm with streaming APIs using Maxeler dataflow engines are currently under design. |
Start Year | 2016 |
Description | Collaboration with NHS (Belfast HSCT) on real-time analytics of ICU patient data |
Organisation | Royal Victoria Hospital, Belfast |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | A real-time analytics appliance (micro-server plus in-memory data analytics software) for analysing continuously respiratory data of ICU Patients, with the objective to regulate oxygen intake and prevent lung injury. |
Collaborator Contribution | Analytical algorithms and infrastructure support at Royal Victoria Hospital, Belfast. |
Impact | Appliance operating and automatically detecting potential lung injury emergencies at Royal Victoria Hospital ICU. |
Start Year | 2015 |
Description | NVTV Interview on Superocmputing |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Interview in NVTV's Behind the Science program on Supercomputing as a technology with impact on our everyday lives. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.nvtv.co.uk/shows/behind-the-science-dimitrios-nikolopoulos/ |