Algorithmic Support for Massive Scale Distributed Systems

Lead Research Organisation: University of Leeds

Department Name: Sch of Computing

Abstract

Resource scheduling in massive-scale distributed systems is the process of matching demand with supply. Demand is associated with requests for resources to execute workloads, such as jobs, tasks and applications. Typical resources in a distributed computing system include servers within a data centre cluster. A scheduler aims to achieve several goals, for example, to maximise system throughput, to minimise response time, to optimise energy usage, etc. These goals may conflict (e.g. throughput versus latency), and the scheduler needs to make a suitable compromise, depending on the user's needs and objectives.
In a data centre system with hundreds of thousands of distributed servers, its massive scale is characterised by a number of factors that contribute to the system complexity:
- the number of server nodes in the cluster, interconnections between resources and heterogeneity of resources (different types of CPUs, memories, local storages);
- the number of concurrent jobs in the system and their arrival rate;
- heterogeneity of jobs (different requirements of CPU, memory and local storage; different patterns of resource usage, long-running jobs vs short-alive jobs; urgent jobs vs jobs with loose deadlines).
The key requirement for the system is its scalability - the ability of the system to sustain the required throughput level (such as operations per second) while confining the perceptional response latencies to a level similar to a small or medium size system.
In our project, we aim to address the following challenges:
(a) scheduling at scale (to make prompt scheduling decisions at a rapid rate);
(b) resource utilisation at scale (to improve utilisation of resources while maintaining high quality of service);
(c) Quality-of-Service provision at scale (to satisfy requirements of diverse workloads).
Existing scheduling algorithms developed for practical systems are often designed largely based on empirical knowledge, experience, and best effort. Due to the lack of theoretical foundation, performance of those algorithms cannot be always guaranteed. On the other hand, scheduling algorithms proposed by the theoretical community are usually based on oversimplified abstract system models. Theoretically sound algorithms, with guaranteed accuracy and time complexity, are often impractical because system models do not reflect practical complexity of real systems, and even minor adjustments of system models towards real systems make algorithms no longer applicable.
In our project, theoretical and applied experts will consolidate efforts to conduct jointly an interdisciplinary study, overcoming the shortcomings of isolated research. Overall, our project is 1) methodologically driven, attempting to extend the applicability of the most powerful techniques of mathematical optimisation; 2) application driven, where the challenges of massive-scale distributed systems invoke new developments of scheduling methodology; and 3) practice driven, where the research direction is based on hands-on experience of distributed systems specialists.

Planned Impact

This project will address scientific and practical challenges in resource scheduling and decision making of Cloud-based cluster management systems by pulling together theoretical and applied research: 1) enabling scalable and prompt scheduling decisions in modern massive-scale distributed systems, 2) allowing intelligent and autonomous management by combining toolkits of optimisation algorithms and advanced machine learning techniques, and 3) enhancing the overall utilization and reducing the operational costs by utilizing the appropriate resource over-subscription, autonomous system failover and energy optimised models.
This project will proactively engage with academia, industry, and the general public in order to ensure that it delivers a substantial impact on UK's Digital Economy on the UK's society at large. All these will have significant impact on the society in the upcoming massive scale AI based and big data driven computing systems.
Academic impact will be delivered through
- publications in high impact journals and presentations at leading conferences,
- delivering a series of dissemination and training workshops organised at the University of Leeds, at the premises of our project partners, Alibaba Group (China), and Edgetic Ltd. (Leeds), and under the umbrella of the Data Centre Alliance,
- making available the source codes of our software prototypes,
- creating and maintaining the project web site where we will systematically present our findings, publish working papers, preprints, presentation slides, maintain the library of software products and benchmarks used in our experiments.
Industrial impact will be delivered through
- close collaboration with British and international Cloud Data Centre industries (Edgetic Ltd, Alibaba Group and the Data Centre Alliance which joins over 120 companies across Europe and over 20 companies in China),
- publishing peer-reviewed industrial case studies verified by our project's industrial partners,
- publishing project findings on the project web-site and through the partners' extensive professional networks, including the Data Central Portal maintained by the Data Centre Alliance,
- organising project-themed workshop and tutorials at international conferences and within professional institutions such as IEEE and the British Computer Society.
Economic and societal engagement will be achieved through
- knowledge transfer and commercialisation of research results,
- organising new skills training courses for Cloud and AI professionals,
- making steps towards creating a future Doctoral Training Centre in Big Data and Intelligent Manufacturing at the University of Leeds, a university's spin-out company, and enhancing the partnerships with UK's and international companies in the field.

Funded Value:

£1,010,658

Funded Period:

Jul 20 - Apr 24

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/T01461X/1

Principal Investigator:

Natalia Shakhlevich

Research Subject:

Mathematical sciences (100%)

Research Topic:

Mathematical Aspects of OR (100%)

Organisations

People	ORCID iD
Natalia Shakhlevich (Principal Investigator)	http://orcid.org/0000-0002-5225-4008
Thomas Erlebach (Co-Investigator)
Vitaly Strusevich (Co-Investigator)
Jie Xu (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Carastan-Santos D (2021) Short-Term Ambient Temperature Forecasting for Smart Heaters

Erlebach T (2023) Parameterised temporal exploration problems in Journal of Computer and System Sciences

Erlebach T (2022) Exploration of k-edge-deficient temporal graphs in Acta Informatica

Fagnon V (2022) Euro-Par 2022: Parallel Processing - 28th International Conference on Parallel and Distributed Computing, Glasgow, UK, August 22-26, 2022, Proceedings

Hei Y (2021) Hawk: Rapid Android Malware Detection Through Heterogeneous Graph Attention Networks. in IEEE transactions on neural networks and learning systems

Hu C (2020) TOPOSCH: Latency-Aware Scheduling Based on Critical Path Analysis on Shared YARN Clusters

Kumar S (2023) DebtCom: Technical Debt-Aware Service Recomposition in SaaS Cloud in IEEE Transactions on Services Computing

Li T (2023) BisSiam: Bispectrum Siamese Network Based Contrastive Learning for UAV Anomaly Detection in IEEE Transactions on Knowledge and Data Engineering

Mommessin C (2023) Affinity-aware resource provisioning for long-running applications in shared clusters in Journal of Parallel and Distributed Computing

Peng H (2021) Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks in ACM Transactions on Knowledge Discovery from Data

Peng H (2021) Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks in ACM Transactions on Information Systems

Peng H (2022) Lime: Low-Cost and Incremental Learning for Dynamic Heterogeneous Information Networks in IEEE Transactions on Computers

Shioura A (2023) Preemptive scheduling of parallel jobs of two sizes with controllable processing times in Journal of Scheduling

Shioura A Preemptive Scheduling of Parallel Jobs of Two Sizes with Controllable Processing Times in Journal of Scheduling

Song Y (2022) Incentivizing Online Edge Caching via Auction - Based Subsidization

Song Y (2021) Joint optimization of cache placement and request routing in unreliable networks in Journal of Parallel and Distributed Computing

Wen Z (2022) Orchestrating Networked Machine Learning Applications Using Autosteer in IEEE Internet Computing

Yang R (2020) Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters in IEEE Transactions on Parallel and Distributed Systems

Yang Y (2023) RoSGAS : Adaptive Social Bot Detection with Reinforced Self-supervised GNN Architecture Search in ACM Transactions on the Web

Zhu J (2022) QoS-Aware Co-Scheduling for Distributed Long-Running Applications on Shared Clusters in IEEE Transactions on Parallel and Distributed Systems

Key Findings
Collaboration
Software and Technical Products
Engagement Activities


Description	Since the start of the project we have collected system requirements (based on Alibaba datasets and latest trends in modern large scale providers of computing resources), formulated formal models of the modern systems and made significant steps towards addressing the key challenges. Our main focus is on workload co-location, balancing cluster utilisation and applications' quality of service (QoS), taking into account an important type of workloads - cloud based Long Running Applications (LRAs). Microservice architecture advances the manifestation of distributed LRAs (DLRAs), comprising multiple interconnected microservices that are executed in long-lived distributed containers and serve massive user requests. This particularly boosts the requirement for strict QoS management when diverse workloads are mixed. Our research on workload co-location and co-scheduling by far indicates the effectiveness of holistic QoS management, including detection, containment, mitigation and prediction of the QoS violation in the event of the network uncertainties and latency propagation across dependent microservices. Further challenges include interference-aware job scheduling and fine-grained resource management in uncertain and extra-dynamic environments. In terms of content and cache management in edge computing environments, there exists a practical need for incentivizing content providers to cache contents at distributed network edges closer to end users. However, this is a particularly challenging problem due to system environments that are uncertain, content placements that couple adjacent time slots, and economic properties that are desired but hard to ensure. Our initial study indicates that devising online auction based mechanisms can incentivize content providers to cache contents continuously in distributed edge caches while enabling the edge network operators to determine bid selection, request dispatching, cache updating, and edge infrastructure provisioning to adapt to dynamic and uncertain environments. Our research on traffic management for data streaming applications in edge computing environments indicates the importance of handling IoT traffic with different levels of sensitivity and criticality by satisfying the application-specific latency constraints. This question arises in the practical deployment of edge computing, where user data can arrive at a much faster rate than that they can be processed by an edge node. Addressing this question is critical for meeting the latency requirement for latency-sensitive applications. It is imperative to dynamically allocate the uplink bandwidth according to the latency constraint of the application, by giving higher priority to latency-sensitive applications, based on the node's forwarding capacity and the requirements of input data streams. A holistic traffic orchestrator can be helpful to avoid oversubscribing edge nodes and improve the overall system throughput. In terms of the theoretical research, we have studied more narrow models and performed their in-depth analysis. Having established links to the enhanced versions of the classical scheduling problems with parallel machines and with the bin packing problem, we developed new models (with conflict and affinities) typical for modern massive scale distributed systems. The state-of-the-art knowledge on Vector Bin Packing has been systemised and enriched with further developments. The library Vectorpack has been created, made publicly available and successfully adopted for scheduling in distributed systems with conflicts and affinities.
Exploitation Route	We expect that our future findings will - enhance the body of research on algorithms and systems for optimising resource usage in large-scale computing systems, - serve as the basis for the development of such systems for actual resource providers.
Sectors	Digital/Communication/Information Technologies (including Software),Energy


Description	Alibaba Scheduling team and Alibaba Damo Academy
Organisation	Alibaba Group
Country	China
Sector	Private
PI Contribution	The members of the project team Dr. Renyu Yang and Prof. Jie Xu collaborate with Alibaba Scheduling team and Alibaba Damo Academy on the key problems associated with project: - computing resource over-selling, - QoS-aware workload colocation, - GPU parallelism and deep learning acceleration, - agent-based resource inference and performance prediction, - computing resources capacity planning.
Collaborator Contribution	- Shaping research problems and collaborative research - Providing servers for experimental work and assistance with computational experiments - Finalising research findings in journal and conference publications
Impact	Joint papers - "Performance-aware Speculative Resource Oversubscription for Large-scale Clusters" published in IEEE TPDS - "QoS-Aware Co-Scheduling for Distributed Long-Running Applications on Shared Clusters" submitted to IEEE TPDS - "STRONGHOLD: Fast and Affordable Billion-scale Deep Learning Model Training" (under submission) - "Perph: A Workload Co-location Agent with Online Performance Prediction and Resource Inference" published in IEEE/ACM CCGRID - "An Affinity-Aware Capacity Planning for Scheduling Long-Running Applications in Shared Cluster"
Start Year	2020


Description	Beihang University
Organisation	Beihang University
Department	School of Computer Science & Engineering
Country	China
Sector	Academic/University
PI Contribution	Our project team member, Dr. Renyu Yang and Prof. Jie Xu, constantly collaborates with the Cloud system team at Beihang (led by Prof. Chunming Hu, Prof. Tianyu Wo) and AI team (Prof. Jianxin Li and Dr. Hao Peng). The collaborative research spans over a broad range of problems related to our project: resource oversubscription, QoS-aware co-scheduling in shared clusters, cache management in edge computing, deep learning models for scheduling systems and fault tolerance, resource management optimisation for deep learning models
Collaborator Contribution	Joint research on the topics outlined in the previous field. Additionally, the Beihang team provide computing facilities (servers) and assists in performing computational experiments.
Impact	A number of papers which are included in the publication list, co-authored by the Beihang colleagues: Incentivizing Online Content Caching in Distributed Edge Networks via Auction-Based Subsidization. Published in IEEE SECON 2022 Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks. ACM Trans. Inf. Syst. 40(4): 69:1-69:46 (2022) Hawk: Reliable Android Malware Detection through Heterogeneous Graph Neural Network. IEEE Trans. on Neural Networks and Learning Systems (TNNLS) 2021 Joint Optimization of Cache Placement and Request Routing in Unreliable Networks. Journal of Parallel and Distributed Computing (JPDC), 2021 LIME: Low-Cost Incremental Learning for Dynamic Heterogeneous Information Networks. Transactions on Computers (TC), 2022 Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks, ACM TKDD, 2021 OWL: Detecting Anomalous Social Network Users by Exploring User Relevance, submitted to Information Sciences (under review) Janus: Latency-Aware Traffic Scheduling for Data Streaming in Edge Computing, submitted to IEEE Trans. on Services Computing (under review) Y. Song, L. Jiao, R. Yang, T. Wo, J. Xu. Incentivizing Online Edge Caching with Edge Provisioning via Auctions. Submitted to IEEE TMC 2023 (under review)
Start Year	2010


Description	Collaboration with Prof. Denis Trystram and Vincent Fagnon
Organisation	Laboratoire d'Informatique de Grenoble
Country	France
Sector	Academic/University
PI Contribution	The member of our team, Dr. Clement Mommessin, collaborates with Prof. Denis Trystram, Vincent Fagnon and Giorgio Lucarelli on mutli-agent scheduling
Collaborator Contribution	The team has successfully completed two pieces of work "Short-Term Ambient Temperature Forecasting for Smart Heaters" and "Two-Agent Scheduling with Resource Augmentation on Multiple Machines".
Impact	https://doi.org/10.1007/978-3-031-12597-3_16; https://doi.org/10.1109/ISCC53001.2021.9631550
Start Year	2021


Description	Collaboration with the Batsim team (the group around Millian Poquet)
Organisation	Inria Grenoble - Rhône-Alpes research centre
Country	France
Sector	Public
PI Contribution	The team-member of our project, Clement Mommessin, takes part at the final stages of the development of a simulator BatSim, a scientific simulator to analyze batch schedulers. On the 16th of February, Clement Mommessin made a presentation at the seminar organised by the DataMove (Data Aware Large Scale Computing) research team: "Affinity-Aware Capacity Planning for Scheduling Long-Running Applications in Shared Clusters"
Collaborator Contribution	The team works on a new version release of the Batsim simulator and preparing a paper to showcase this new release, including the detailed state of art of simulators, and experimental comparisons against competitors.
Impact	The two major outputs is (1) the simulator and (2) the paper like "Batsim: Infrastructure Simulator for Jobs and I/O Scheduling" is under preparation.
Start Year	2021


Description	Collaborative Research with Dr. Akiyoshi Shioura (Japan) and Prof. Vitaly A. Strusevich (Greenwich)
Organisation	Tokyo Institute of Technology
Country	Japan
Sector	Academic/University
PI Contribution	The collaboration is of interdisciplinary nature: the main area of expertise of the UK research team (Dr. Shakhlevich and Prof. Strusevich) is Scheduling Theory, while the Japanese collaborator Dr. Akiyoshi Shioura is internationally leading researcher in Submodular Optimisation. After the project was completed, Dr. Shioura keeps arranging annual visits to our team. We have productive meetings and continue working in-between the meetings via email. As an outcome, our collaborative research portfolio is expanding. Prof. Strusevich is retired from 1 January 2021, but he is actively involved in research. We have two papers completed, one has been recently accepted.
Collaborator Contribution	The advantages of the Submodular Optimisation techniques are not fully acknowledged by the general scheduling community and often overlooked. The contribution of Dr. Shioura had led to advancing Submodular Optimisation methods in application to scheduling, resulting in a new methodology for solving scheduling problems via Submodular Optimisation.
Impact	Preemptive Scheduling of Parallel Jobs of Two Sizes with Controllable Processing Times. Journal of Scheduling (accepted)
Start Year	2006


Description	Collaborative Research with Dr. Akiyoshi Shioura (Japan) and Prof. Vitaly A. Strusevich (Greenwich)
Organisation	University of Greenwich
Country	United Kingdom
Sector	Academic/University
PI Contribution	The collaboration is of interdisciplinary nature: the main area of expertise of the UK research team (Dr. Shakhlevich and Prof. Strusevich) is Scheduling Theory, while the Japanese collaborator Dr. Akiyoshi Shioura is internationally leading researcher in Submodular Optimisation. After the project was completed, Dr. Shioura keeps arranging annual visits to our team. We have productive meetings and continue working in-between the meetings via email. As an outcome, our collaborative research portfolio is expanding. Prof. Strusevich is retired from 1 January 2021, but he is actively involved in research. We have two papers completed, one has been recently accepted.
Collaborator Contribution	The advantages of the Submodular Optimisation techniques are not fully acknowledged by the general scheduling community and often overlooked. The contribution of Dr. Shioura had led to advancing Submodular Optimisation methods in application to scheduling, resulting in a new methodology for solving scheduling problems via Submodular Optimisation.
Impact	Preemptive Scheduling of Parallel Jobs of Two Sizes with Controllable Processing Times. Journal of Scheduling (accepted)
Start Year	2006


Description	Collaborative Research with Prof. Thomas Erlebach (Durham)
Organisation	Durham University
Country	United Kingdom
Sector	Academic/University
PI Contribution	Prof. Thomas Erlebach had joined the project team to replace Prof. Vitaly Strusevich (co-I, retired). Our joint work provides the theoretical foundation for algorithmic work of the team.
Collaborator Contribution	Complimentary study of theoretical aspects of the applied problems under study.
Impact	Research paper under preparation: "Classification and Evaluation of the Algorithms for Vector Bin Packing" (joint Clement Mommesin). https://doi.org/10.1007/s00236-022-00421-5
Start Year	2020


Description	Collaborative research with Prof. Alexander Kononov
Organisation	Russian Academy of Sciences
Department	Sobolev Institute of Mathematics
Country	Russian Federation
Sector	Academic/University
PI Contribution	Uncertainty plays an important role in real-world scheduling problems. Our joint work with Prof. Kononov is aimed at addressing the issues of robustness and resiliency in flow shop and open shop systems.
Collaborator Contribution	The work on flow-shop scheduling problems under uncertainty is completed. We are finalising our findings on the open shop problem under the uncertainty.
Impact	Work in progress; no output yet.
Start Year	2020


Description	Edgetic
Organisation	Edgetic Ltd
Country	United Kingdom
Sector	Private
PI Contribution	The members of the project team, Prof. Jie Xu and Dr. Renyu Yang, participate in early technical discussion with the Edgetic staff on container management and Kubernetes scheduling.
Collaborator Contribution	Shaping the problem on container management and Kubernetes scheduling and its preliminary analysis.
Impact	The work is ongoing. No output yet.
Start Year	2019


Description	Partnership with Alibaba Group
Organisation	Alibaba Group
Country	China
Sector	Private
PI Contribution	Work in progress; no output yet.
Collaborator Contribution	We are at the initial stage of collaboration, collecting requirements for the resource management system and developing a system model.
Impact	We are at the initial stage of collaboration. Once we achieve our first milestone in the development of the resource management system, testing, evaluation and fine-tuning will be performed jointly with our partner.
Start Year	2020


Description	Partnership with Edgetic Ltd
Organisation	Edgetic Ltd
Country	United Kingdom
Sector	Private
PI Contribution	Work in progress; no output yet.
Collaborator Contribution	We are at the initial stage of collaboration, collecting requirements for the resource management system and developing a system model.
Impact	We are at the initial stage of collaboration. Once we achieve our first milestone in the development of the resource management system, testing, evaluation and fine-tuning will be performed jointly with our partner.
Start Year	2020


Title	HAWK
Description	Open source framework for malware detection through heterogeneous graph attention networks
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	The accompanying paper has attracted 26 citations according to Googles Scholar. The software usage is harder to evaluate.
URL	https://github.com/RingBDStack/HAWK


Title	Vectorpack
Description	C++ library of optimisation algorithms for the Vector Bin Packing problem
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	The library is being used by the students of the University of Leeds working on their UG and PG projects. At least one student at the Poznan University of Technology works on the project linked to Vectorpack.
URL	https://github.com/Vectorpack/Vectorpack_cpp


Description	Collaboration meet-ups with industrial partners
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Prof. Jie Xu and Dr. Renyu Yang had technical discussions and meetings with a number of industrial partners with the aim of developing future opportunities for joint project proposals and collaborative work. The companies are Samsung SAIT (UK), Brandon Medical (UK), KPMG (UK), Kuaishou (China), Zhejiang Telecom (China), Digital Farm Initiative (UK-US)
Year(s) Of Engagement Activity	2021,2022


Description	Government engagement
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Prof. Jie Xu is an UKCRC Executive Member for two terms, responsible for international activities. He also serves as the EDI representative (Equality Diversion and Inclusion).
Year(s) Of Engagement Activity	2020,2021,2022


Description	Invited talks and keynote speaches
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talks by Prof. Jie Xu (keynote speech at ISKE 2021, Dec 2021), Dr Renyu Yang (invited talk to Newcastle University, Mar 2021; Beihang University, April 2021; Zhejiang University of Technology, July 2021)
Year(s) Of Engagement Activity	2021


Description	Reachout within Alan Turing Institute (Prof. Jie Xu and Dr. Renyu Yang)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Attended the workshop organised the Alan Turing Institute to develop new collaboration opportunities under the Turing Fellow programme and under the Turing Postdoc enrichment programme.
Year(s) Of Engagement Activity	2021,2022

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications