Asynchronous Task Parallelism for exascale and beyond

Lead Research Organisation: University of Bristol
Department Name: Computer Science


Supercomputers have historically seen a 1000x performance increase every 10 years but the push
the next generation of machines will be the most difficult yet. Targeted to arrive in 2022, machines
able to achieve 1 exaFLOPS (10^18 floating point operations per second) will mark the start of the
exascale era. Traditional sources of performance improvement have all been exhausted, making
increased parallelism the most viable way to achieve further computational power.
This increased parallelism will be brought about by both increasing the number of cores per node and
increasing the number of nodes in a machine. An exascale machine is expected to contain O(10^6)
nodes, each with O(10^3) cores per node, hence this increase in parallelism is expected to occur
both at the inter- and intra- node level.
Unfortunately, increasing parallelism alone will not result in being able to reach exascale due, in part,
to two factors. Firstly, current programming approaches make heavy use of synchronisation: at the
intra-node level threads are subject to the fork/join model and at the inter-node level the bulk
synchronous parallel (BSP) paradigm is commonly employed by using barriers. Secondly, it is widely
accepted that core speeds will no longer be consistent across either a single socket or an entire
node. This performance variability in hardware means that evenly dividing up work between threads
will lead to load imbalance, causing the entire computation to run at the speed of the slowest core.
To solve these issues, the high-performance computing community has started to consider
asynchronous task-parallelism as a solution. As a programming model, asynchronicity allows hiding
inter-node communication latency, whilst dependencies between tasks enforce program correctness
and allows for finer-grained synchronisation. Therefore, it is thought that the reduction in
synchronisation could dramatically improve performance in machines that utilise high degrees of
parallelism. Furthermore, to solve the problem of hardware performance variability, tasking would be
a sensible approach as it allows for implicitly handling load balance by executing work when needed
as opposed to current commonly-used methods of statically dividing up a problem domain a priori.
In addition to solving problems that will exist in upcoming supercomputers, tasking may also allow for
more irregular and complex scientific programs to be developed, that have previously performed
poorly or cannot be used at all due to the lack of support for a suitable programming model.
Much of this research is in the very early stages and there are still questions over the efficacy of
tasking on realistic scientific programs. In the early stages of my degree I will find, identify, and
potentially create programs that currently require tasking with the goal of forming a consensus in the
community on suitable applications which can be used to form benchmarks to compare and test
different tasking approaches. This will lead on to determining the features that are required and
features that are currently implemented well by existing frameworks, with the final aim of the work to
be able to advise and contribute towards a programming model for future supercomputers.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509619/1 01/10/2016 30/09/2021
1792447 Studentship EP/N509619/1 01/10/2016 31/03/2020 Patrick Atkinson Atkinson