Emergent Scheduling for Distributed Execution Frameworks

Lead Research Organisation: Lancaster University
Department Name: Computing & Communications

Abstract

Distributed Execution Frameworks such as Apache Spark offer a generalized platform for
scheduling and executing jobs across a large number of nodes processing Terabytes of
data (1; 2). However, Distributed Execution Frameworks are limited to a subset of workloads due to their implemented method/s for scheduling and executing jobs, hindering
performance when given a more diverse range of workloads (3).
As a result, a range of different distributed execution frameworks have been developed
(E.g. Ray and Apache Flink) the design of each being tuned to specific kinds of tasks.
Ray, addresses the effects the latency overhead created by Sparks Global Scheduler may
have on fine-grained communication intensive and latency sensitive tasks (E.g. Reinforcement Learning simulations) where the overhead of sparks scheduler may be longer
than the duration of the task, Ray achieves this reduction in overhead by splitting the
scheduler into two levels with tasks deployed to a worker first and a global scheduler
second sacrificing data-locality if a worker with the required data is full for a reduction
in latency (3). Whereas, Apache Flink provides stream based computation providing in
memory speed real-time processing of data avoiding Sparks scheduling within artificial
batches of real-time stream events, again adding latency to task computation (4).
1
Therefore, the current research intends to investigate the application of Emergent Scheduling within a component based Distributed Execution Framework in order to overcome
the previously mentioned restriction faced by modern Distributed Execution Frameworks
through the use of a learning agent controlled approach to run-time adaption. Implemented using the Dana programming language allowing components (behaviour) of the
Emergent Distributed Execution Framework to be swapped at run-time, changing specific
behaviors in order to provide an improvement in performance for a given environment
(I.e. Workload type and Distributed Execution Framework).

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513076/1 01/10/2018 30/09/2023
2907321 Studentship EP/R513076/1 01/10/2018 31/03/2022 Paul Dean