Quality of Service Provision for Grid Applications via Intelligent Scheduling

Lead Research Organisation: University of Leeds
Department Name: Sch of Computing

Abstract

Grid computing can be defined as coordinated resource sharing and problem solving in dynamic, multi-institutional collaborations. The success of a Grid infrastructure is based on a number of fundamental requirements, including the ability to provide dynamic and efficient services. Underpinning such a system is the need to ensure that the Grid infrastructure is delivering the required Quality of Service to its users. Quality of Service (QoS) is the ability of an application to have some level of assurance that users' requirements can be satisfied. It can be seen as an agreement between a user and a resource provider to execute an application within a guaranteed time frame at a pre-specified cost. As a rule, the higher the cost paid by the user, the smaller the execution time a resource provider can ensure. The novel contribution of the proposed project is to produce a new type of Grid resource broker with an advanced scheduling component aimed at optimising both resource usage costs and applications' execution times to enforce QoS. It should combine the features of two types of brokers: system-centric and user-centric providing a transparent means of meeting users' requirements and at the same time optimising the usage of Grid resources on the provider's end. This proposal is timely in that it addresses the need for continued development of infrastructure support for Grid computing. It responds to the increased attention of the Grid community to QoS provision and higher expectations of Grid users to receive adequate services at an agreed price payable for the agreed execution time. Our research will take advantage of the achievements in the classical scheduling theory and the newly emerged Grid scheduling research and will advance the frontiers of both areas. Grid applications give rise to new enhanced scheduling models. These enhanced models generally cannot be handled by the existing scheduling techniques developed mainly for manufacturing applications. They are characterised by complex additional constraints including those related to data storage and data transfer, co-ordinating the execution of linked tasks and arranging the required data interchange. Further challenges are related to the dynamic nature of Grid systems with the changing availability and quality of resources. The new aspects of QoS provision introduce additional complexity to scheduling. The project will draw on expertise of two established research groups at the University of Leeds: Algorithms and Complexity Group and Collaborative Architectures and Performance Group. The Algorithms and Complexity Group performs multidisciplinary research in algorithms, combinatorics and optimisation. Inter alia, the group develops and analyses advanced mathematical techniques for solving complex optimisation problems including those related to the areas of scheduling and optimal resource allocation. Research of the Collaborative Architectures and Performance Group focuses on Intelligent Infrastructures for large-scale applications. In particular, research of the group brings together e-science, Grid and adaptive computing systems research.
 
Description The main outcomes of our project are in the area of scheduling algorithms and in the design of an intelligent Grid resource broker.

In our Scheduling research, we
- analysed typical scheduling algorithms applied in distributed computing with respect to time and cost factors,
- transferred findings from Scheduling Theory to distributed computing to complement existing Grid scheduling algorithms;
- developed new algorithms to optimise time and cost, performed their analytical analysis and empirical evaluation;
- studied the effect of the immediate start requirement, typical for modern distributed systems.

Our results demonstrated that the new algorithms outperform those currently applied in distributed computing, producing schedules of lower cost in more than 85% of the instances; they are fast and easy to implement and can easily be embodied in a Grid broker.

In our research on the Grid broker, we
- developed an innovative architecture for a resource broker, that incorporates a negotiation mechanism for the quality of service provision,
- designed a new pricing model, that binds the price to the end user with the quality of service provided by the service provider,
- performed a simulation study of the pricing mechanism;
- formulated recommendations on selection strategies and on the system stability that may be incorporated in any design of economically orientated Grid.

The experiments with the Grid broker incorporated major scheduling algorithms for time/cost optimisation.

In addition to the Grid scheduling research, we examined two other important models of scheduling in distributed systems: the model with divisible loads and energy aware computing. For the first model we identified major flaws in the prior research in the area, and produced correct solutions to the bicriteria problem of time/cost optimisation. For the second model, we developed new heuristic algorithms for optimising the energy consumption cost and demonstrated their superiority via computational experiments.

In 2014-16, we formulated and studied a new scheduling model that includes "immediate job starting times", an important feature modern cloud computing systems. For that model, we developed a number of fast scheduling algorithms that find optimal solutions efficiently. The paper has been recently accepted for publication in the Journal of Scheduling.

Another outcome is a new tool (resource-boxing) which helps in bridging the gap between theoretical scheduling research and applied research in distributed computing. The conference presentation by Bernhard Primas, PhD student jointly supervised by myself and Prof. Jie Xu, the leader of the Distributed Systems and Services group at the University of Leeds, had won the Best Student Paper award at the prestigious international conference.

In our most recent study we explore how advanced optimisation techniques can improve performance of modern massive scale distributed systems.

The second research direction we currently pursue is Scheduling Divisible Loads with Time and Cost Constraints. The new research paper, prepared jointly with Maciej Drozdowski, is now published in Journal of Scheduling.

The most recent achievement is a new successful EPSRC funded grant to develop Algorithmic Support for Massive Scale Distributed Systems. The work is due to start on 1 May 2020.
Exploitation Route Development of Grid brokers

Cloud Schedulers

Scheduling and resource allocation algorithms for distributed computing

Energy minimisation in distributed systems
Sectors Digital/Communication/Information Technologies (including Software),Energy

 
Description The advancements achieved in scheduling algorithms and in the Grid broker design had led to formulating new results of theoretical and applied nature, important for both areas - Scheduling and Distributed Computing. The developed proof of the concept prototype demonstrated that the proposed methods lead to economically viable solutions, and can lead to achieving successfully the objectives of Grid users and resource providers. The conducted research serves the foundation for our current research in the area of Energy Aware Scheduling for distributed computing. Our most recent joint work with Bernhard Primas (PhD student), Prod. Jie Xu, Dr. Peter Garraghan and Dr. Karim Djemame, Prof. Vitaly Strusevich and Dr. Akiyoshi Shioura had resulted in the discovery of a new scheduling model which arises in modern distributed systems and the development of new efficient scheduling algorithms for it. The outcomes are now published in the Journal of Scheduling and INFORMS Journal on Computing. Another research direction aimed at bridging the gap between theoretical scheduling research and applied research in distributed computing had led to the development of a useful tool (resource-boxing tool) for modelling and experimental work. The presentation by Bernhard Primas of preliminary results at the international conference had received the "Best Student Paper Award". We have prepared recently a new EPSRC project proposal to advance scheduling research in application to modern massive scale distributing systems. The application was successful. The new project is due to start on 1 May 2020.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software),Energy
Impact Types Economic

 
Description Collaborative research with Maciej Drozdowski, Poznan University of Technology 
Organisation Poznan University of Technology
Country Poland 
Sector Academic/University 
PI Contribution We completed a study of a range of problems on scheduling divisible loads, providing their comprehensive analysis and complexity classification. We now exchange ideas on future research on scheduling for distributed computing. The main focus is on resource consolidation, optimising energy usage.
Collaborator Contribution Identifying open problems in the area of scheduling divisible loads and their collaborative study.
Impact Our joint paper "Scheduling Divisible Loads with Time and Cost Constraints" is published in Journal of Scheduling.
Start Year 2016
 
Description Collaborative research with the Distributed Systems and Services (DSS) group, School of Computing, University of Leeds 
Organisation University of Leeds
Department Leeds Institute of Health Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution We have extended our collaborative research with the Distributed Systems and Services (DSS) group, School of Computing, University of Leeds. We now have PhD student Bernhard Primas, who is jointly supervised with the leader of the of the DSS group Prof. Jie Xu. Our interdisciplinary research is aimed and bridging the gap between theoretical scheduling research and applied research of distributed systems.
Collaborator Contribution The members of the DSS group introduced new models for our theoretical study. The most prominent model incorporates immediate start condition for the jobs and speed scaling for processors.
Impact The paper published in INFORMS Journal on Computing 29 (4), pp. 724-736 (2017): Machine speed scaling by adapting methods for convex optimization with submodular constraints ? by Shioura, A., Shakhlevich, N.V., Strusevich V.A. The paper published in Journal of Scheduling 21(5): 505-516 (2018): "Speed scaling for scheduling problems with immediate start of jobs" Shioura, A., Shakhlevich, N.V., Strusevich, V.A., Primas, B, The conference paper with the members of the DSS team "Resource boxing: converting realistic cloud task utilization patterns for theoretical scheduling" Primas, B., Garraghan, P., Djemame, K., and Shakhlevich, N.V. in UCC'2016 Proceedings of the 9th International Conference on Utility and Cloud Computing, ACM, New York, 138-147, 2016, doi:10.1145/2996890.2996897 The paper had won the Best Student Paper Award
Start Year 2014
 
Description A plenary presentation Divisible Load Scheduling to Minimize the Computation Time and Cost; at the 10th Workshop on Models and Algorithms for Planning and Scheduling Problems (MAPSP'2011), Nymburk, Czech Republic, 2011 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact A prestigious award for a presentation to be selected as a plenary talk at the main forum of scheduling researchers.

Further possible extension of the work with collaborators from the Poznan University of Technology (Prof. Maciej Drozdowski)
Year(s) Of Engagement Activity 2011
URL http://kam.mff.cuni.cz/~mapsp/program1.html
 
Description Our joint research with the PhD student Bernhard Primas and the members of the DSS group was presented at the 9th International Conference on Utility and Cloud Computing (UCC'2016 ) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The conference paper
"Resource boxing: converting realistic cloud task utilization patterns for theoretical scheduling"
by Primas, B., Garraghan, P., Djemame, K., and Shakhlevich, N.V.
was presented by B. Primas and won the Best Student Paper Award.
doi:10.1145/2996890.2996897
Year(s) Of Engagement Activity 2016
URL http://computing.derby.ac.uk/ucc2016/