Gather-Level Parallelism

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Modern big-data workloads are currently run on processors that are ill suited to extracting high performance from them. These workloads request data from DRAM main memory in unpredictable patterns, and so existing hardware and software solutions are ineffective at anticipating the high-latency work that is necessary. Today's computer processors are constantly stalled waiting for data to return from memory, with low throughput and inefficient execution.

The untapped potential is significant. Many big-data workloads show potential forms of memory-level parallelism, where complex sequences of operations can be reordered and overlapped to hide much of the latency, and thus achieve high throughput. While compute-bound workloads have been greatly aided by special vector instructions, which calculate many operations simultaneously, the same cannot yet be said for memory-bound big-data workloads.

This project seeks to marry compute parallelism with memory parallelism. We will redesign memory access methods with vectorisation in mind. This will allow us to prefetch, or anticipate the demands of, many complex sequences of compute and memory access all at once.

We believe it is possible to make high-performance, efficient software mechanisms for prefetching that are as performant as dedicated hardware. Our new mechanisms will exploit a new concept called gather-level parallelism, by utilising and redesigning vector methods to achieve extreme memory-level parallelism without breaking the bank on programmable compute. To succeed, we will require new changes throughout both the compiler and the underlying hardware architecture.

Gather-level parallelism stands to bring about a paradigm shift in how memory is accessed in today's systems. This is particularly pressing in an era where big-data workloads are already inefficient, and threatened further by the security mitigations necessary to eliminate the recently discovered Spectre vulnerabilities in today's computer processors. These cause even greater challenges in memory latency due to the restrictions used to hide vulnerabilities, thus making high-performance memory techniques even more pressing. The status quo, a choice between performance and security, on big data that is often sensitive, cannot be allowed to continue.

Publications

10 25 50