Automated performance optimization based on post-mortem parallel performance analysis

Lead Research Organisation: University of Manchester

Department Name: Computer Science

Abstract

Post-mortem parallel performance analysis allows to identify performance bottlenecks in the parallel program execution. Despite recent advances in performance analysis automation, a significant human effort is still required to take advantage of the analysis results and optimize programs. The thesis will investigate new automation techniques for static and dynamic performance optimization, leveraging the program execution analysis tools and libraries developed in the APT group. These tools employ artificial neural networks to automatically identify performance anomalies and their likely causes, but still require an expert programmer to intervene in order to improve performance. This is a time-consuming, error-prone task which this project aims to automate.

The aim of the project is to investigate the possibility of replacing heuristic approaches for dynamic program optimization, in particular targeting the problems of work scheduling and data placement in non-uniform memory access architecture systems as well as in large scale distributed memory systems, with more robust machine learning techniques that can automatically adapt to either new execution environments or to dynamically evolving execution conditions (e.g. node failure requiring to re-balance the load, dynamic application behaviour such as in mesh refinement creating run-time imbalance, etc.). Previous work at The University of Manchester has shown that it is possible to automatically detect the occurrence of such events and, to some extent, to identify the causes of performance degradation. This presents an exciting opportunity to build on top of this research and to automate the last missing piece.
The main source of the input data to the optimizer will be ComPerf and this thesis will investigate how identified bottlenecks can be mitigated. It can either involve recompiling the application code, modification of the program binary or changing the schedule and data placement in the parallel execution. In a first stage, a set of benchmarks will be selected to establish the execution baseline. It will include publicly available applications, but also can be extended with applications that better shows effectiveness of the applied optimizations. Large-scale applications from the EU project EuroEXA (754337), such as weather and climate modeling, physics and life science simulation. Three leading metrics will be used to asses the optimizer: throughput, latency and power efficiency.
The use of Artificial Neural Networks has a potential to be a suitable tool to automate the performance optimization process as they require minimum assumptions and are able to effectively generalize based on incomplete data. One of the possible training settings is to use execution traces and initial application information as an input and a reference optimization as the target.
Finally, this thesis aims to show that such approaches can be used to generalize over different architectures, from off-the-shelf Intel platforms to the EuroEXA supercomputer prototype.

Student:

Igor Wodiany

Period of Study:

Sep 19 - Dec 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2297349

Research Topic:

Unclassified

Organisations

University of Manchester (Lead Research Organisation)

People	ORCID iD
Igor Wodiany (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Byerly Flint H (2022) You vs. us: framing adaptation behavior in terms of private or social benefits. in Climatic change

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513131/1			30/09/2018	29/09/2023
2297349	Studentship	EP/R513131/1	30/09/2019	31/12/2022	Igor Wodiany
EP/T517823/1			30/09/2020	29/09/2025
2297349	Studentship	EP/T517823/1	30/09/2019	31/12/2022	Igor Wodiany

Key Findings
Impact Summary
Software and Technical Products
Engagement Activities


Description	One of the main key findings, so far, from the work funded by this award was to broaden our understanding on how we can better analyse computer applications to improve their performance, i.e. how can we make computer programs faster. We used those findings to contribute to the OpenMP standard (one of the major standard used in the field of high performance computing).
Exploitation Route	Findings of our research can be used by researchers and practitioners in the field of high-performance computing who want to better understand performance of parallel applications. In fact our findings were used to influence changes to the profiling interface of the OpenMP standard, hence they can be directly used by tools developers following the standard.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://github.com/pepperpots/Afterompt


Description	Our findings presented in the paper "AfterOMPT: An OMPT-Based Tool for Fine-Grained Tracing of Tasks and Loops" were used to improve parts of the tracing interface in the OpenMP standard. We worked with the OpenMP Tools Subcommittee to incorporate those findings into a revision of the standard that was published in November 2021 (OpenMP standard version 5.2). The OpenMP standard is one of the most widely used standards in the field of high-performance computing, and it is used by many private and public organizations, so our contribution to the standard will eventually be incorporated into mainstream tools used across industry and academia.
First Year Of Impact	2021
Sector	Digital/Communication/Information Technologies (including Software)


Title	AfterOMPT
Description	AfterOMPT is a trace-based tool for analyzing the execution of OpenMP applications using the OMPT interface to capture accurate information on loop partitioning, distribution of iteration spaces across workers, task scheduling, and synchronization events.
Type Of Technology	Webtool/Application
Year Produced	2020
Open Source License?	Yes
Impact	Our tool has been used to guide the development and justify the improvements to the loops tracing interface in the OpenMP standard. Our findings were used to improve the specification of the loops related callbacks (ompt_work_callback_t and ompt_dispatch_callback_t) for fine-grained profiling. The changes to the specification, based on our findings, were incorporated into OpenMP 5.2 release.
URL	https://github.com/pepperpots/Afterompt


Description	OpenMP Tools Subcommittee Member
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	After presenting our paper "AfterOMPT: An OMPT-based tool for fine-grained tracing of tasks and loops" at the International Workshop on OpenMP (IWOMP) in 2020, we were invited to participate in the OpenMP Tools Subcommittee to contribute our findings into the OpenMP standard. We worked with the subcommittee to improve callbacks related to loops tracing to improve the support for fine-grained profiling. As a result changes were made to two callbacks (ompt_work_callback_t and ompt_dispatch_callback_t) and the update has been published in the OpenMP standard version 5.2.
Year(s) Of Engagement Activity	2020
URL	https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects