Innovative parallelism and programming for micro-core architectures

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Modern central processing units (CPU) have provided significant performance gains over previous generations but at increasingly unsustainable levels of power consumption. Micro-core architectures combine a large number of simple, low-power, low-memory cores placed on a single chip, providing significant parallel performance at very low power levels. However, micro-core architectures are difficult to program and the immaturity of programming support is currently a significant barrier to adoption.

This research centres around the hypothesis that non-HPC programmers find parallelisation techniques found in micro-core architectures difficult and their preferred choice of dynamic programming languages (e.g. Python) exacerbates the issue. The focus of this work is concerned with programmability, both in terms of design and implementation. The programming language ePython has been used as a research vehicle for the work, with the intention that findings from the research questions will have a wide relevance to mainstream technologies.

The research questions that this project is currently addressing are:

1. What is the most effective way to support programmer interaction with micro-core architectures?

2. Are the techniques in the current ePython implementation appropriate for other micro-core architectures?

3. What is the realistic performance that Python / High-Level Language programmers can expect on micro-core architectures?

ePython has been developed specifically to target micro-core architectures, providing a simple, yet rich, parallel programming environment. It has been successfully ported to a number of micro-core architectures: the Adapteva Epiphany III, Xilinx MicroBlaze and PicoRV32 RISC-V.

This work has resulted in two key achievements for the leveraging of extremely memory-constrained micro-core accelerators within the ePython programming language: the ability to manage arbitrary large data and the support of natively compiled codes of unbounded size. ePython now not only provides Python developers with performance approaching 90% of hand-coded C applications with dynamic code loading, within a minimum memory footprint of only 3KB.

The restrictions imposed by the COVID-19 pandemic have had, and continue to have, a major impact on my family care duties and household income, requiring additional non-project work. Consequently, an application was submitted and granted for a 6-month EPSRC PhD funding extension, ensuring that the project will be successfully brought back on track.

Publications:

M. Jamieson and N. Brown, 'High level programming abstractions for leveraging hierarchical memories with micro-core architectures', Journal of Parallel and Distributed Computing, vol. 138, pp. 128-138, Apr. 2020, doi: 10.1016/j.jpdc.2019.11.011.

M. Jamieson, N. Brown, and S. Liu, 'Having your cake and eating it: Exploiting Python for programmer productivity and performance on micro-core architectures using ePython', Proceedings of the 19th Python in Science Conference (SciPy2020), Virtual Conference, pp. 107-115, 2020, doi: 10.25080/Majora-342d178e-00f.

Conference posters:

M. Jamieson, N. Brown, 'Eithne: A Framework for Benchmarking Micro-Core Accelerators', Poster presented at: SC19 Conference, Denver, Colorado, 2019 Nov 17-22. [Online]. Available: https://sc19.supercomputing.org/proceedings/tech_poster/tech_poster_pages/rpost186.html

M. Jamieson, N. Brown, and S. Liu, 'Having your Cake and Eating it: Exploiting Python for Programmer Productivity and Performance on Micro-core Architectures Using ePython', Poster presented at: 19th Python in Science Conference (SciPy2020), Virtual Conference, 2020 Jul 6-12. [Online]. Available: https://raw.githubusercontent.com/mesham/epython/master/docs/SciPy-20-landscape-v1d6.pdf

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509644/1 01/10/2016 30/09/2021
1929846 Studentship EP/N509644/1 01/09/2017 31/08/2021 Maurice Jamieson
 
Description The High-Performance Computing (HPC) community is acutely aware of the challenges that the end of Moore's Law and Dennard scaling impose on the implementation of exascale architectures due to the end of significant generational performance improvements of traditional processor designs, such as x86. Power consumption and energy efficiency is also a major concern when scaling the core count of traditional CPU designs. Therefore, other technologies, such as micro-cores, being considered by the community. Micro-core architectures look to address this issue by implementing a large number of simple cores running in parallel on a single chip and have been used in successful HPC architectures, such as the Sunway SW26010 of the Sunway TaihuLight (#3 June 2019 Top500) and the 2048 core PEZY-SC2 of the Shoubu system B (#1 June 2019 Green500). Micro-cores are also being used for new HPC architectures, for example the RISC-V based accelerator of the European Processor Initiative (EPI) for the next generation of European exascale machines. Whilst these technologies promise the ability to be used as accelerators for scientific workloads, both in the embedded and HPC worlds, the reality of actually programming these CPUs is very challenging.

A version of the Python programming language, ePython, has been developed specifically to target micro-core architectures. It provides a simple, yet rich, parallel programming environment for Python programmers to develop parallel codes. The work funded by this award has resulted in two key achievements for the leveraging of extremely memory-constrained micro-core accelerators within the ePython programming language: the ability to manage programs with arbitrary data size and the support of natively compiled codes of unbounded size running on the micro-cores. The addition of memory-hierarchy support enables codes to be developed that have data requirements that exceed the extremely limited on-chip memory (often only 32KB) of these devices. Furthermore, the ePython memory programming language abstractions provide a familiar model for Python programmers to easily deploy their existing codes to these micro-core accelerators. The natively compiled code support within ePython not only provides Python developers with performance approaching native C applications but also the ability to deploy code of arbitrary size through the implementation of dynamic loading. This method allows the natively compiled ePython codes to load library functions or kernels as required, caching them or garbage collecting them as appropriate. This approach to native compilation and memory hierarchies has enabled the domain of machine learning for detecting lung cancer in 3D CT scans on these micro-core architectures.
Exploitation Route The ability for ePython to manage applications that have arbitrary data size and codes that are much larger than the limited on-chip memory provided by micro-core architectures, whilst also closing the performance gap with native code compiled for these platforms, provides the computational science / scientific computing community with the opportunity to leverage these powerful, energy efficient architectures for scientific computation and simulation that was previously unavailable to programmers using the popular Python programming language. Furthermore, the techniques developed for ePython could be applied to other programming languages looking to leverage micro-core accelerators.
Sectors Digital/Communication/Information Technologies (including Software)

 
Title Eithne: A framework for benchmarking micro-core accelerators 
Description Eithne supports the benchmarking of micro-core architectures, whether physical chips or soft-cores running on FPGAs, by providing a framework that abstracts over the complex architectural differences. It enables a single a single benchmark codebase to be deployed to multiple devices by targeting the required devices at compile time. Running the benchmark suite is as simple as starting execution from the host, with the framework managing all communications / data transfer and kernel execution. The same interaction model is used for all devices to minimise the impact of communications bandwidth and latency across devices. A wide variety of different host / device links are supported, from on-chip communications for Zynq FPGAs, to on-board communications in the case of the Adapteva Epiphany. The measurement of the host and device bandwidth can be measured if required, which can be especially useful for instance with soft-cores running on FGPAs. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact At the time of creation, there were no available frameworks to enable the benchmarking of micro-core architectures using standard benchmarks such as LINPACK. The Eithne framework has not only provided this award project with the ability to benchmark competing micro-core architectures but has also provided the wider HPC community with a foundation for other micro-core benchmarks. This interest to the wider HPC community resulted in a research poster outlining the framework being accepted and presented at the SC19 HPC conference in Denver 2019. 
URL https://sc19.supercomputing.org/proceedings/tech_poster/tech_poster_pages/rpost186.html