Hardware Acceleration of Functional Languages (HAFLANG)

Lead Research Organisation: Heriot-Watt University

Department Name: S of Mathematical and Computer Sciences

Abstract

The performance of programming language implementations until 10 years ago relied on increasing clock frequencies on uni-core CPUs. The last decade has seen the rise of the multi-core era adding processing elements to CPUs, to enable general purpose parallel computing.

Due to a single connection from multiple cores on a CPU to main memory, general purpose languages with parallelism support are finding the limits of general purpose CPU architectures that have been extended with parallelism. The fabric on which we compute has changed fundamentally.

Driven by the needs of AI, Big Data and energy efficiency, industry is moving away from general purpose CPUs to efficient special purpose hardware e.g. Google's Tensorflow Processing Unit (TPU) in 2016, Huawei's Neural Processing Unit (NPU) in smartphones, and Graphcore's Intelligent Processing Unit (IPU) in 2017. This reflects a wider shift to special purpose hardware to improve execution efficiency.

Functional languages are gaining widespread use in industry due to reduced development time, better maintainability, code correctness with assistance of static type checkers, and ease of deterministic parallelism. Functional language implementations overwhelmingly target general purpose CPUs, and hence have limited control over cache behaviour, sharing, prefetching and garbage collection locality. As such, they are reaching their performance limits due to the trade-off between parallelism and memory contention. This project takes the view that rather than using compiler optimisations to squeeze small incremental performance improvements from CPUs, special purpose hardware on programmable FPGAs may instead be able to provide a step change improvement by moving these non-deterministic inefficiencies into hardware.

Graph reduction is a functional execution model that offers intriguing opportunities for developing radically different processor architectures. Early ideas stem back to the 1980s, well before the age of advanced Field Programmable Gate Array (FPGA) technology of the last 5-10 years.

We believe that a bespoke FPGA memory hierarchy for functional languages could minimise memory traffic, thus avoiding the costs of cache misses and memory access latencies that quickly become the bottleneck for medium and large sized functional programs. We believe that lowering key runtime system components (prefetching, garbage collection, parallelism) to hardware, with a domain specific instruction set for graph reduction, will significantly reduce runtimes.

We aim to inspire the computer architecture community to extend this project by developing accurate cost models for functional languages that target special purpose functional language hardware.

Our HAFLANG project will target the Xilinx Alveo U280 accelerator board, a state-of-the-art UltraScale+ FPGA-based platform as a research vehicle for developing the FPU. The HAFLANG compilation framework will be designed to be extensible, and hence make the FPU processor a target for other languages in future.

By developing a hardware accelerator, we believe it is possible to engineer a processor that (1) will execute programs with twice the throughput compared with GHC compiled Haskell executing on conventional mid-tier 4-16 core x86/x86-64 CPUs, and (2) consumes four times less energy than by executing programming languages on CPUs.

Funded Value:

£350,700

Funded Period:

Jul 22 - Jul 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/W009447/1

Principal Investigator:

Robert Stewart

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Electronic Devices & Subsys. (30%)

Fundamentals of Computing (70%)

Organisations

People	ORCID iD
Robert Stewart (Principal Investigator)	http://orcid.org/0000-0003-0365-693X

Publications

Author Name

Title Publication Date Published

10 25 50

Sestito C (2022) Accuracy Evaluation of Transposed Convolution-Based Quantized Neural Networks

Sestito C (2022) Design-Space Exploration of Quantized Transposed Convolutional Neural Networks for FPGA-based Systems-on-Chip

Engagement Activities


Description	Blog posts with informal dissemination of early ideas and results
Form Of Engagement Activity	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The postdoc wrote three blog posts on the project's website in March 2023. These describe, with accessible language, the early ideas and preliminary results for this project. https://haflang.github.io/posts/2023-03-07-memo_01.html https://haflang.github.io/posts/2023-03-08-memo_02.html https://haflang.github.io/posts/2023-03-13-memo_03.html
Year(s) Of Engagement Activity	2023
URL	https://haflang.github.io/posts/2023-03-07-memo_01.html


Description	Invited research seminar at University of Glasgow
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Other audiences
Results and Impact	The project's postdoc gave a research seminar in the computer science department at the University of Glasgow. Title " Introducing the HAFLANG Project: Hardware Acceleration of Functional LANGuages". There were 20 people in the room, and 15 people online at this hybrid event.
Year(s) Of Engagement Activity	2022
URL	https://www.dcs.gla.ac.uk/plug/

Abstract

Organisations

People

ORCID iD

Publications