ABC: Adaptive Brokerage for the Cloud

Lead Research Organisation: University of St Andrews
Department Name: Computer Science


The answer to both these questions is usually 'No' even by those familiar with cloud computing. Such uncertainty is caused by 3 main reasons:

1) A bewildering choice of service offerings by the many service providers (e.g. Amazon, Google, Microsoft, etc.).
2) Difficulty in comparison between offered services due to non-uniform description of specifications.
3) High time and monetary costs associated with continuous monitoring of the services on offer from various providers.

This project will introduce more certainty into the selection of cloud services through the introduction of a smart and continuously adaptive cloud broker. The broker will act as an intermediary between end users and cloud service providers in order to enhance service delivery and service value. This brokerage service will be designed to manage heterogeneous cloud offerings, including public, private or hybrid environments. Such brokerage will open up an entirely new multi-cloud marketplace, allowing applications to be simply deployed to the optimal provider and resource type, reducing complexity, vendor lock-in and computational running costs.

Our research will result in a number of fundamental contributions to the cloud computing field. First we will address the problem of how to define, schedule and enforce user-defined Service Level Objectives (SLOs): high-level intentions, which specify the desired end goal of a deployment for applications that span multiple cloud providers with complex inter-dependencies. This will allow users to focus on what (e.g., failure tolerance) needs to be achieved, rather than low-level specifics about how (e.g., deploy to Amazon compute optimised VM) applications are deployed. This automation will in turn help abstract many of the complexities associated with low-level configuration from the user.

Second, we will develop novel lightweight container-based benchmarking techniques, which can gather cloud-level performance metrics in near real-time in a multi-cloud environment. These techniques will be general in scope and allow users to obtain a near real time perspective of the 'weather', or current state across a range of cloud providers.

Third, we will develop adaptive machine learning strategies for the autonomic and pro-active management of cloud-based applications. The application of machine learning will aid decisions about which providers and resource configuration meet the requirements specified in the SLO, how these trade off against cost, and when to redeploy to different providers, or instance types based on active management, etc.

We are confident that this project will address an urgent and fundamental question: how to leverage cloud infrastructure to quickly, cheaply and efficiently perform vital computational workloads. Solving this problem is crucial to the UK digital economy, which is increasingly reliant on the cloud. The developed smart brokerage framework will enable digital economy stakeholders to optimise their use of cloud resources. This is beneficial to all areas of business, including start-ups and micro-businesses who can benefit greatly from the flexibility created by platform independence and adaptive management strategies.

Planned Impact

This proposal is a careful balance between cutting edge research with significant academic impact, together with a pathways to impact strategy targeting economic, societal and people impact. Our multi-faceted impact strategy contains the following key elements:

1) Collaborating with key companies from the digital economy sector both in terms of evaluating the cloud brokerage technology and also providing an important route to technology transfer (targeting economic impact). More specifically, we will collaborate with two contrasting companies, Adobe and Satalia who bring use cases offering complete coverage of evaluating the level of platform independence offered alongside the level of optimisation in terms of deployment, management and execution in a range of different multi-cloud environments. To complement this, we will also reach out to micro-businesses and SMEs through established networks in Scotland and the North-West of England (using SICSA and the Knowledge Business Centre at Lancaster University).

2) Seeking the formation of a startup company and investigating potentially novel business models as an intrinsic aspect of the science targeting economic impact). We see real potential in developing a commercialisation strategy in parallel with the research programme as there are currently no comprehensive cloud brokerage services that allow services to be traded as a commodity.

3) Working with partners from the science community again in terms of providing a comprehensive evaluation of the approach and providing an additional technology transfer pathway (targeting societal impact). The goals of this work are to i) evaluate our approach and its effectiveness in supporting the scientific community, and ii) subsequently transfer knowledge on the use of cloud computing in science and, more specifically, the use of cloud brokerage in providing platform independence and optimising the use of cloud resources according to experimental constraints. We will work closely with two contrasting partners, providing coverage of key issues in computational science, i.e. the Centre for Ecology and Hydrology and colleagues working on astronomy at St Andrews.

4) Operating a public engagement programme, employing a range of mechanisms including press releases, social media and podcasts to reach wide audiences (targeting societal impact). We will work through existing forums including the Edinburgh TechMeetUp, SICSA DemoFest and KBC Technology Showcases and also plan to use Compucast (a monthly computer science podcast) to reach a general technical and scientific audience.

5) Developing people with advanced skills in cloud computing/management, an area of acute shortage in terms of supporting the digital economy (targeting people impact). This proposal will help address this acute shortage by training post-docs with internationally leading skill-sets in this area, ready for positions of leadership and influence in academia or industry. This development of people in terms of skills and knowledge is an additional important impact of the project, and this will be enhanced through pro-actively seeking means of linking student project work to the programme of research.
Description The goal of this project is to develop a smart cloud brokerage framework which facilitates business, including start-ups and micro-businesses, to manage their cloud-based application in an adaptive fashion.

Combined with an existing practical Infrastructure-as-Code (IaC) tool and a SLO language that we developed for specifying reliability requirements, our broker system can automate the provisioning and management process for business applications. During the lifecycle of the application, our broker system monitors various metrics and then takes a data-driven approach to intelligently make decisions to scale cloud resources adaptively and automatically. We have published papers for the different components that can be used in the broker system. And we are currently working on integrating the components and implementing the remaining parts to complete the automated decision-making process.
Exploitation Route This key finding could primarily be used by software developers, businesses and scientists who want to optimise their own cloud deployments by taking advantage of our automated cloud broker software. The broker itself abstracts away many of the complexities of selecting and configuring cloud instances, allowing the end user to focus more on the important aspects of application development.
Sectors Creative Economy,Electronics

Description Economic impact: We have investigated the formation of a startup company which will build an automated marketplace around public cloud providers. With the help of the Knowledge Transfer Centre (KTC) in St Andrews we have had an initial business plan accepted, which defines the terms of the startup in relation to the University. This has now been agreed at University level and is ready should we decide to commercialise. We will continue to work on our minimum viable product in order to test our overall research hypothesis. Societal and people impact: We ran the 5th iteration of the successful CrossCloud workshop in Porto. CrossCloud was colocated with the flagship EuroSys conference and the proceedings were published in the ACM. The workshop was sponsored by the ABC project and was well attended by a range of practitioners. We also ran the 6th iteration of the CrossCloud workshop in Cyprus, colocated with the CCGrid 2019 conference. Researcher development: This project has supported the career development of three Research Assistants: Yuhui Lin, Sheriffo Ceesay and Ryan Wilson.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic

Description Impact Acceleration Award (IAA): Knowledge Exchange in High Performance Computing: A Collaborative Visit to the Barcelona Supercomputing Centre
Amount £5,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 05/2019 
End 03/2021
Title A Generic model for MapReduce applications performance approximation 
Description We develop a parametric model to estimate the execution time of a given MapReduce application for a given cluster of nodes. Models can be used to infer the performance of unseen applications and approximate their performance when an arbitrary dataset is used as input. This parametric model is capable of predicting the execution time of a big data application running on a YARN cluster. We applied it to model the execution time of MapReduce (YARN) applications tested on different clusters and data sizes. Moreover, this model gives an insight into the performance characteristics of the generic phases of MapReduce, as well as an understanding of how MapReduce applications of different design patterns perform. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact We also have developed a proof of concept application developed based on this model which is available at 
Title A generic Infrastructure-as-code base framework for multi-clouds VM evaluation 
Description We develop a user-oriented framework designed for automated VM instance evaluation using Infrastructure-as-Code (IaC). IaC is a new trend of cloud management to treat cloud resources as software and manages them with a machine-readable script. This framework is generic and highly configurable. It can reduce the human effort involved in the scenarios of VM evaluation process, such as sampling VM performance data by benchmarking. Users can define a search strategy with different objectives, e.g. cost-value or performance, and configure candidate VM options from different cloud providers. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact We use this framework to set up two proof-of-concept systems: an exhaustive sampling system and a parallel Bayesian optimization VM searching system. The exhaustive sampling system can automate the process of VM provisioning, benchmark tool deployment and execution and feedback logging. The parallel Bayesian optimization VM searching system aims to explore the advantage of the use of parallel Bayesian optimization searching optimal VMs from multiple cloud providers. This tool will be part of the toolchain delivered by the ABC project. 
Title A multi-cloud pricing scheme comparing tool 
Description This tool automates the process of requesting VM products with the pricing information from the cloud providers' pricing API. It pulls data from the API and then reformats them into schematic JSON file. Currently, this tool supports Amazon AWS, Google Cloud and Digital Ocean. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? No  
Impact This tool facilitates the comparison of VM products offered from multiple cloud providers. We use this tool to gather a general profile of VM products from Amazon. This profile contributes to the subsequent VM benchmarking research in this project. This tool will be part of the toolchain delivered by the ABC project. 
Title A semi-supervised Non-negative Matrix Factorization based VM performance modelling tool 
Description Selecting a suitable VM instance type for an application can be a difficult task because of the number of options and the variety of application requirements. Recent research takes a data-driven approach to model VM performance, but this requires carefully choosing a small set of relevant benchmarks as input. We propose a semi-supervised matrix-factorization-based latent variable approach to predict the performance of an unknown new application. This method allows to take a large set of benchmarks as input for VM performance modelling, and it uses the model and the performance measure of the new application on some of the target VMs to predict the performance of the rest of all VM 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact As an initial evaluation, We ran a prototype with 373 micro-benchmarks to predict the scores of a general-purpose benchmark, i.e. Geekbench, for 37 AWS EC2 VMs. The results show that our method is effective in the setting, with RMSE and STD being (6.7, 4.5) when sampling Geekbench on 5 VMs, and (10.0, 2.8) when sampling 10. 
Title An Infrastructure-as-Code Based Framework for Evaluating VM Instances from Multiple Clouds 
Description To choose an optimal VM, Cloud users often need to step through the process of evaluating the performance of VMs by benchmarking or running a black-box search technique such as Bayesian optimisation. To facilitate the process, we develop a generic and highly configurable Framework with Infrastructure-as-Code (IaC) support For VM Evaluation (FIFE). FIFE abstract the process as a searcher, selector, deployer and interpreter. It allows users to specify the target VM sets and evaluation objectives with JSON to automate the process. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact FIFE has a modular design. It is highly configurable to allow users to set up systems depending on the target application and their objectives, e.g. searching for an optimal VM, and sampling VMs by benchmarking. We used FIFE to set up a Bayesian optimization system to illustrate the process of setting up a system from FIFE. An evaluation was carried out to assess the search performance of the system with respect to the search space with multiple cloud providers and parallel search. 
Description Google Visiting Researcher 
Organisation Google
Department Research at Google
Country United States 
Sector Private 
PI Contribution Professor Adam Barker was awarded a Senior Visiting Research Scientist at Google through the Visiting Faculty programme where he is spending a period of research leave from the University of St Andrews. "The Google Visiting Faculty program aims to identify and support world-class, full-time faculty pursuing research in areas of mutual interest. Each year, through the Google Visiting Faculty Program, over 25 academics visit Google from universities all over the world." Applicants must be nominated internally by senior staff within Google. This has allowed Adam to take a second period of research leave (June 2019 - Jan 2020) to work (with no cost to St Andrews) on cloud computing research at Google's Sunnyvale campus in California. Previously, Adam spent a period of leave (January 2016 - September 2016) at Google as a Visiting Scientist in Mountain view.
Collaborator Contribution Primarily these contributions are about knowledge exchange and experience to Professor Adam Barker. By working on real-world production systems within Google, Adam is able to steer the direction of the ABC project more effectively towards real problems / solutions.
Impact Google data center trace release, (not yet public) EuroSys 2020 paper, (not yet public)
Start Year 2019