Automated performance optimization based on post-mortem parallel performance analysis

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

Post-mortem parallel performance analysis allows to identify performance bottlenecks in the parallel program execution. Despite recent advances in performance analysis automation, a significant human effort is still required to take advantage of the analysis results and optimize programs. The thesis will investigate new automation techniques for static and dynamic performance optimization, leveraging the program execution analysis tools and libraries developed in the APT group. These tools employ artificial neural networks to automatically identify performance anomalies and their likely causes, but still require an expert programmer to intervene in order to improve performance. This is a time-consuming, error-prone task which this project aims to automate.

The aim of the project is to investigate the possibility of replacing heuristic approaches for dynamic program optimization, in particular targeting the problems of work scheduling and data placement in non-uniform memory access architecture systems as well as in large scale distributed memory systems, with more robust machine learning techniques that can automatically adapt to either new execution environments or to dynamically evolving execution conditions (e.g. node failure requiring to re-balance the load, dynamic application behaviour such as in mesh refinement creating run-time imbalance, etc.). Previous work at The University of Manchester has shown that it is possible to automatically detect the occurrence of such events and, to some extent, to identify the causes of performance degradation. This presents an exciting opportunity to build on top of this research and to automate the last missing piece.
The main source of the input data to the optimizer will be ComPerf and this thesis will investigate how identified bottlenecks can be mitigated. It can either involve recompiling the application code, modification of the program binary or changing the schedule and data placement in the parallel execution. In a first stage, a set of benchmarks will be selected to establish the execution baseline. It will include publicly available applications, but also can be extended with applications that better shows effectiveness of the applied optimizations. Large-scale applications from the EU project EuroEXA (754337), such as weather and climate modeling, physics and life science simulation. Three leading metrics will be used to asses the optimizer: throughput, latency and power efficiency.
The use of Artificial Neural Networks has a potential to be a suitable tool to automate the performance optimization process as they require minimum assumptions and are able to effectively generalize based on incomplete data. One of the possible training settings is to use execution traces and initial application information as an input and a reference optimization as the target.
Finally, this thesis aims to show that such approaches can be used to generalize over different architectures, from off-the-shelf Intel platforms to the EuroEXA supercomputer prototype.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513131/1 01/10/2018 30/09/2023
2297349 Studentship EP/R513131/1 01/10/2019 31/12/2022 Igor Wodiany
 
Description One of the main key findings, so far, from the work funded by this award was to broaden our understanding on how we can better analyse computer applications to improve their performance, i.e. how can we make computer programs faster. We used those findings to contribute to the OpenMP standard (one of the major standard used in the field of high performance computing).
Exploitation Route Findings of our research can be used by researchers and practitioners in the field of high-performance computing who want to better understand performance of parallel applications. In fact our findings were used to influence changes to the profiling interface of the OpenMP standard, hence they can be directly used by tools developers following the standard.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://github.com/pepperpots/Afterompt
 
Description Our findings presented in the paper "AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and Loops" were used to improve parts of the tracing interface in the OpenMP standard. We worked with the OpenMP Tools Subcommittee to incorporate those findings into a revision of the standard that was published in November 2021 (OpenMP standard version 5.2). The OpenMP standard is one of the most widely used standards in the field of high-performance computing, and it is used by many private and public organizations, so our contribution to the standard will eventually be incorporated into mainstream tools used across industry and academia.
First Year Of Impact 2021
Sector Digital/Communication/Information Technologies (including Software)
 
Title AfterOMPT 
Description AfterOMPT is a trace-based tool for analyzing the execution of OpenMP applications using the OMPT interface to capture accurate information on loop partitioning, distribution of iteration spaces across workers, task scheduling, and synchronization events. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Our tool has been used to guide the development and justify the improvements to the loops tracing interface in the OpenMP standard. Our findings were used to improve the specification of the loops related callbacks (ompt_work_callback_t and ompt_dispatch_callback_t) for fine-grained profiling. The changes to the specification, based on our findings, were incorporated into OpenMP 5.2 release. 
URL https://github.com/pepperpots/Afterompt
 
Description OpenMP Tools Subcommittee Member 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact After presenting our paper "AfterOMPT: An OMPT-based tool for fine-grained tracing of tasks and loops" at the International Workshop on OpenMP (IWOMP) in 2020, we were invited to participate in the OpenMP Tools Subcommittee to contribute our findings into the OpenMP standard. We worked with the subcommittee to improve callbacks related to loops tracing to improve the support for fine-grained profiling. As a result changes were made to two callbacks (ompt_work_callback_t and ompt_dispatch_callback_t) and the update has been published in the OpenMP standard version 5.2.
Year(s) Of Engagement Activity 2020
URL https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf