Exploiting Parallelism through Type Transformations for Hybrid Manycore Systems

Lead Research Organisation: University of Glasgow
Department Name: School of Computing Science

Abstract

Modern computing systems are becoming increasingly diverse, but the common feature of all emerging computing platforms is the increased potential for performing many computations in parallel, by providing large numbers of processor cores. Computer systems consisting of various different platforms have great potential for performing tasks fast and efficiently. However, programming such systems is a great challenge.

The era of performance increase through increased clock speeds has come to an end and we have entered a period where performance increases can only come from increased numbers of heterogeneous computational cores and their effective exploitation by software. Because of the immense effort required to adapt existing parallel software to novel hardware
architectures with present technology, there is a very real danger that future advances in hardware performance will have little impact on practical large-scale computing
using legacy software.

The specific challenge that we want to address in this proposal is how to exploit the parallelism of a given computing platform, e.g. a multicore CPU, a graphics processor (GPU) or a Field-Programmable Gate Array (FPGA), in the best possible way, without having to change the original program. These different platforms have very different properties in terms of the available parallelism, depending on the nature and organisation of the processing cores and the memory. In particular FPGAs have great potential for parallelism but they are radically different in architecture from mainstream processors. This makes them very difficult to program.

The key problem here is how to transform a program so that it will best use the potential for parallelism provided by the computing platform, and crucially, how to do this so that the resulting program is guaranteed to have the same behaviour as the original program.
Our proposed approach is to use an advanced type system called Multi-Party Session Types to describe the communication between the tasks that make up a computation.
To use a rough analogy, the computation could for instance be viewed as a car assembly line, where every unit performs a particular task such as painting, inserting doors, wheels, motor etc. Depending on the organisation and composition of the factory, the order in which these operations is performed will determine the speed with which a car can be assembled. However, when reordering the operations, one must of course ensure that changing the order does not lead to incorrect assembly.

To return to the computational problem, by using the Multi-Party Session Types to describe the communication, we have a formal way of reasoning about the transformations. By developing a formal language for the transformations we can prove their correctness. This is the main novelty of the proposal: the formal system for type transformations. The actual transformations can be viewed as "programs" in this formal language. They will be informed by the properties of the computing platform. To provide this link between the transformation and the platform, we will also develop a formal description of parallel computing platforms.

By building these formal systems into a compiler we will be able to transform programs to run in the most efficient way on hybrid manycore platforms.
The main benefit from the proposed research is that the programmer will not need to have in-depth knowledge of the highly complex architecture of a hybrid manycore platform. This will be of great benefit to in particular scientific computing, because it also means that programs will not need to be rewritten to run with best performance on novel systems.

To demonstrate the effectiveness of our approach we aim to develop a proof-of-concept compiler which will transform programs so that they can run on FPGAs, because this type of computing platform is the most different from other platforms and hence the most challenging.

Planned Impact

Realising our vision will lead to a revolution in the programming of heterogeneous high-performance computing systems. Progress in this area is essential to maintain productivity on the next generation of heterogeneous parallel systems, in particular with the move towards exascale systems (1000x bigger than today's systems). To reduce power consumption, these systems will have to be heterogeneous. Achieving high performance on such systems with the existing programming tools and methodologies is proving increasingly difficult. Our proposed work will lead to highly novel compiler technologies, based on a sound formal foundation with guarantees for correctness and bounds for performance. It will allow scientists and other application developers to focus on implementing their ideas rather than on trying to achieve good performance.

In a nutshell, our compiler will automatically transform the program so that it will run optimally on any given heterogeneous parallel architecture. As a result, parallel systems and high-performance computing systems will become much more accessible, not only to scientists but also to industry. There is a clear need for more accessible high-performance computing (see e.g. http://www.supercomputingscotland.org/). Heterogeneous systems can provide supercomputing capabilities in a form factor suitable for SMEs and small academic research groups. Furthermore, supercomputing hardware changes very quickly, and our solution will make it easy to deploy applications on new platforms, where today this is a very complex and time consuming process.

The applications domain for our work is very broad. To illustrate the potential, we give a few examples based on our own cross-disciplinary research:
- Weather and Climate simulations: by increasing the performance, scientists can run simulation over larger areas, with higher resolution and for longer timescales. Each of these opens a wealth of new possibilities. For example, higher resolution is essential for accurately simulating and predicting severe weather events. Because of climate change, these events are becoming more frequent and more severe. Larger areas allow better long-range forecasts, and longer timescales allow more accurate climate simulations.
- Particle dispersion simulation: this type of simulation is used to e.g. predict dispersion of volcanic ash or radio-active particles. Increased performance will allow more accurate prediction of the dispersion, which in the case of volcanic ash means a reduced downtime for airplanes, and in the case of radio-active particles helps to protect the population.
- Railtrack simulation: simulation of in particular tracks for high-speed trains is essential to ensure the safety. Increased performance will allow more sophisticated simulation (in particular a more realistic model of the train itself). This is required to accurately model degradation of the tracks, which forces trains to run at reduced speeds an extended closure of tracks for repairs.

In terms of impact on industry, following sectors already use FPGA technology and could therefore benefit specifically from making FPGA platforms more accessible: financial computing (e.g. option pricing), biotechnology (esp. gene sequence matching) and drug companies (for molecular dynamics simulations).
Other sectors of industry that require supercomputing resources, and that will consequently benefit from the increased productivity offered by our solution, are renewable energy (through faster and more accurate weather predictions) and aerospace (for computational fluid dynamics simulations), to name a few.

Impact of our work will result primarily from adoption of our approach in industry and academia. The Pathways to Impact document explains how we will achieve this impact.
 
Description We did a study on the capability and achievable performance of FPGAs for HPC applications for an audience of non-computing scientists. Essentially, we show that FPGAs have indeed great potential to make High-Performance Computing more efficient.
Furthermore, we have shown (see our papers) that a combination of type transformations and analytical cost modelling can indeed be used to optimise programs for FPGAs, and we have shown that our approach to optimisation is formally correct.
We have created compilers that can automatically convert legacy scientific code into OpenCL-accelerated code.
Exploitation Route As we show that FPGAs can indeed be used for HPC tasks, and have developed the techniques and tools to achieve optimal performance, this is of interest to the scientific community as well as the FPGA and HPC communities.
As you ask for sectors below, this is of course indirect: any sector needing weather forecasts,fluid dynamics modelling etc would potentially benefit.
Sectors Agriculture, Food and Drink,Construction,Energy,Environment,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Transport,Other

URL http://www.slideshare.net/WimVanderbauwhede/on-the-capability-and-achievable-performance-of-fpgas-for-hpc-applications
 
Description One of the compilers developed for this research has been open-sourced and has been used by people in academia as well as industry to convert legacy Fortran code. Asking for only non-academic impact is very restrictive here because it means that if e.g. an academic climate scientist uses my compiler, I should not report it. But if it would be a commercial weather forecaster, then I can report it. That is quite arbitrary and does not do justice to the impact of the work.
First Year Of Impact 2018
Sector Other
Impact Types Economic

 
Description EPSRC Platform Grant
Amount £1,263,356 (GBP)
Funding ID EP/P010040/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 03/2017 
End 02/2022
 
Description EPSRC Programme Grant
Amount £4,981,302 (GBP)
Funding ID EP/N031768/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 05/2016 
End 11/2021
 
Description HiPEAC Collaboration Grants
Amount € 5,000 (EUR)
Organisation European Union 
Sector Public
Country European Union (EU)
Start 05/2017 
End 08/2017
 
Description Partnership with BIRL, LUMS, Pakistan 
Organisation Lahore University of Management Sciences
Country Pakistan 
Sector Academic/University 
PI Contribution Our expertise is in the area of High Performance Computing (HPC) using specialized devices like FPGAs and GPUs. We have worked considerably on accelerating applications form a wide variety of domains, and are developing an optimizing compiler to automate the process of accelerating such applications. Our contribution was to bring this expertise and insight into the team at BIRL, LUMS, who are working on developing simulation models for cancer systems' biology and proteomics. They are creating an open-source platform that they mean to provide to the wider scientific community, and effecient implementation of their applications on HPC platform is of a huge importance where we are playing a role.
Collaborator Contribution Our partners at the Biomedical Informatics Research Laboratory (BIRL), LUMS, Pakistan, have expertise in creating software for systems biology and proteomics, and already have a number of open-source tools available online. Their insight helps us understand the behaviour and computational requirements of their bioinformatics applications, which then informs our work on creating a generic compiler that targets scientific applications.
Impact I am a research co-investigator on a project funded in Pakistan by the National ICT R&D fund, as a direct result of this partnership.
Start Year 2016
 
Description Partnership with CHREC, University of Florida 
Organisation University of Florida
Country United States 
Sector Academic/University 
PI Contribution The TyTra design flow for FPGAs is a novel approach towards automating the compilation and tuning of scientific applications for High Performance Computing (HPC) FPGA platforms. It is being developed at Glasgow as part of this EPSRC grant. Before this collaboration and the related HiPEAC-funded visit, the work was focused on automating estimation of performance/resources and code-generation for single-device targets. The visit was used to investigate a crucial extension of the compiler which allows it to work on a cluster of FPGA devices. So our contribution to this partnership was our expertise as well as our compiler framework which we are developing as part of this project.
Collaborator Contribution The Novo-G# cluster at CHREC, University of Florida, along with the expertise of their team offered a truly unique environment for my 3-month visit to CHREC. The FPGA cluster that is hosted at CHREC is probably the only one of its kind in the academia anywhere in the world. Professor Lam and his research group have the unique expertise of developing high-performance scientific computing application for large FPGA clusters. So their contribution was in providing access to their very unique hardware setup, their group's expertise and advice, and finally a workstation for my 3-month visit.
Impact We presented a paper and presentation at HIPEAC, Manchester, January 2018, that was directly related to work done at CHREC as part of this collaboration.
Start Year 2017
 
Title A Linear Decomposition of Multiparty Sessions for Safe Distributed Programming (Artifact) 
Description This artifact contains a version of the Scribble tool that, given a protocol specification with multiple participants, can generate Scala APIs for implementing each participant in a type-safe, protocol-abiding way. Crucially, the API generation leverages a decomposition of the multiparty protocol into type-safe peer-to-peer interactions between pairs of participants; and this, in turn, allows to implement the API internals on top of the existing lchannels library for type-safe binary session programming. As a result, several technically challenging aspects in the implementation of multiparty sessions are solved "for free", at the underlying binary level. This includes distributed multiparty session delegation: this artifact implements it for the first time. 
Type Of Technology Software 
Year Produced 2017 
 
Title AutoParallel-Fortran 
Description A domain specific, automatically parallelising source-to-source compiler for Fortran-95 that takes scientific Fortran as input and produces Fortran code parallelised using the OpenCL framework. 
Type Of Technology Software 
Year Produced 2016 
Impact We presented this work in the paper "Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation" at the 29th International Conference on Parallel Computational Fluid Dynamics. A further journal paper has been accepted for publication in the International Journal of Parallel Computing. 
URL https://github.com/wimvanderbauwhede/AutoParallel-Fortran
 
Title CAMP: Cost-Aware Multiparty Session Protocols (artifact) 
Description This is the artifact for the paper *CAMP: Cost-Aware Multiparty Session Protocols*.
The artifact comprises: - A library for specifying cost-aware multiparty protocols.
- The raw data used for comparing the cost models with real execution costs.
- The cost-aware protocol specifications of the benchmarks that we studied. The library for specifying cost-aware protocols also provides functions for
extracting cost equations from them, and for estimating recursive protocol
latencies (i.e. average cost per protocol iteration). We provide a script for
extracting cost equations, and instantiating them using the parameters used in
the paper.
 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/4046892
 
Title CAMP: Cost-Aware Multiparty Session Protocols (artifact) 
Description This is the artifact for the paper *CAMP: Cost-Aware Multiparty Session Protocols*.
The artifact comprises: - A library for specifying cost-aware multiparty protocols.
- The raw data used for comparing the cost models with real execution costs.
- The cost-aware protocol specifications of the benchmarks that we studied. The library for specifying cost-aware protocols also provides functions for
extracting cost equations from them, and for estimating recursive protocol
latencies (i.e. average cost per protocol iteration). We provide a script for
extracting cost equations, and instantiating them using the parameters used in
the paper.
 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/4046893
 
Title MP-STREAM memory performance benchmark 
Description As part of this on-going project, we created MP-STREAM, an OpenCL-based synthetic micro-benchmark for measuring sustained memory bandwidth, optimized for FPGAs, but which can be used on multiple HPC platforms. Our main contribution was the introduction of various generic as well as device-specific parameters that can be tuned to measure their effect on memory bandwidth. We developed a build script that makes the use of our benchmark for many different kind of devices very user-friendly. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact A research paper based on this benchmark has been recently been accepted for publication at IPDPS, and will be available in IEEExplore. We have also shared this benchmark with the industry (Xilinx, Dublin, Ireland), who used this on their tools and reported that they have gained very useful insights to their own tools by using our benchmark. 
URL https://github.com/waqarnabi/mp-stream
 
Title OpenCLIntegration 
Description An OpenCL wrapper class and a SCons build library to simplify integration of OpenCL code in C++, C, Fortran and Perl (c) Wim Vanderbauwhede 2010-2015 
Type Of Technology Software 
Year Produced 2010 
Open Source License? Yes  
Impact This software was essential for my work on acceleration of the Weather Research and Forecasting Model (WRF) and other work on OpenCL acceleration. 
URL https://github.com/wimvanderbauwhede/OpenCLIntegration
 
Title RefactorF4Acc 
Description An Automated Fortran Code Refactoring Tool To Facilitate Acceleration of Numerical Simulation 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact Without this software, I could not have done the research on OpenCL acceleration of the Weather Research and Forecasting model (WRF), nor this year's OTG work on acceleration of a Large Eddy Simulator and model coupling with WRF 
URL https://github.com/wimvanderbauwhede/RefactorF4Acc
 
Title TyBEC 
Description FPGA back-end for the TyTra flow. The tool takes code in TyTra-IR format and produces Verilog. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This tool generates the Verilog code for the FPGA design to be deployed. It is part of the TyTra toolchain to create high-performance FPGA code from legacy scientific Fortran 77 code. 
URL http://tytra.org.uk/
 
Title dot-parser 
Description A parser for DOT files in Rust 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact No such parser previously existed in Rust. This is used for the developement of the Rumpsteak framework. 
URL https://crates.io/crates/dot-parser
 
Description Glasgow Science Center 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact We were asked be part of a book launching event at Glasgow Science Center, aimed at primary school students. My role was to talk about my profession and my project while there, so as to inspire primary school students to take up this profession later in life.
Year(s) Of Engagement Activity 2017