IMC2: Instrumentation Measurement and Control for the Cloud

Lead Research Organisation: University of Glasgow
Department Name: School of Computing Science

Abstract

The Internet landscape is changing rapidly, from a completely decentralised paradigm where distinct services were offered by different providers in a fully distributed and decentralised way, to a unified ICT environment where data, storage, and processing resources are co-located in the Cloud, and offered alongside connectivity. Although Cloud services and the underlying communication infrastructures are built on top of commodity Internet mechanisms (transport protocols, IP switching, multipath routing, etc.), it becomes apparent that the performance-agnostic and slow-converging operational assumptions of today's data communications are challenged by the new unified technological and business model. Massive overprovisioning of fully distributed resources that are managed in distinct and often long timescales (e.g., traffic aggregates over backbone networks) is not sustainable in an environment where connectivity and system resources need to be managed by a single unified ICT provider over a centralised infrastructure and in very short timescales. Cloud providers need to maximise return-on-investment from their infrastructures through rapid provisioning and elastic resource management, offering predictable services while operating at higher utilisation thresholds.

In order to achieve these goals, in this project we will design and develop an always-on Instrumentation, Measurement, and Control (IMC) framework that will dynamically and adaptively provision unified resources in a unified manner and in short timescales. Evidence has shown that distinct control loops typically employed to manage different resources in different timescales can themselves constitute factors of performance degradation over unified Cloud environments. For example, network-agnostic placement and migration of virtual machines can itself cause congestion in the underlying Data Centre topology. We will therefore revisit the one-dimensional, static or pseudo-random control loops that are typically employed over Cloud topologies, and develop an adaptive closed-loop system that will manage both server and network resources synergistically, in short timescales and based on temporal topology-wide performance. In doing so, we will exploit often controversial concepts such as non-shortest path routing for increasing load balancing while meeting flow completion deadlines, and network-aware dynamic virtual machine migration, to demonstrate the feasibility and also the benefits of combinatorial resource provisioning in achieving global performance optimisation and in increasing the usable capacity of future networks and services. One of the key aims of the proposed research is to investigate and to demonstrate the applicability of measurement-based processes to control and to admit resources in a unified manner and at appropriate, short timescales. Through the necessary system and network node instrumentation, we will devise a logically-centralised measurement and control closed-loop architecture that will be an integral part of the underlying infrastructure's data forwarding operation. The long-term impact of such endeavour will be to revisit the currently disjoint data and control planes in packet communications, and to transform next generation networked infrastructures from performance-agnostic to adaptive and self-managed, through synergy across the different layers and planes of the architecture.

The proposed research will be carried out at the University of Glasgow, and experiments will be conducted over a purpose-built programmable Cloud services testbed infrastructure, partly supported by EPSRC's first grant scheme and partly through a generous contribution from the host institution. The research will be conducted in close collaboration with Onyx Group, Microsoft Research and JANET(UK).

Planned Impact

Besides the documented academic impact, this project has excellent potential for both direct and indirect commercial exploitation, partly evidenced by the accompanying letters of support and the close engagement with industry and R&D. Directly, the results of this work will have immediate and long-term applicability for the following beneficiaries:

Cloud service providers and Data Centre operators - Through the always-on combinatorial optimisation of server and network resource usage, results from this research will have an immediate impact in increasing the usable capacity of Cloud Data Centres, and therefore significantly improve return-on-investment for infrastructure and service providers. This will enable providers to absorb short-term increases and fluctuations in traffic demands without investing in excess capacity, or penalising performance and violating service level agreements. The immense importance of this is evident when one considers the drop in revenues that can be caused by even a marginal increase in service response time (e.g., Amazon's EC2 1% drop in sales resulting from 100 ms additional latency).
We will liaise closely with Onyx Group, a national Data Centre operator and service provider with significant regional presence in Glasgow who is supporting this work, and will provide us with operational data and actual topology/provisioning characteristics of private Cloud environments. This will ensure that results arising from this work will be relevant and directly applicable to the growing business community of Cloud service providers in the immediate 1-5 year horizon. With the increasing number of private Cloud companies, this research has excellent potential to have significant impact in the UK's economic competitiveness.
In addition, consultation with Microsoft Research will ensure that the research will also remain relevant and influence the future design of global, public Cloud providers that play a key role in shaping the global service provisioning trends in the long term.

Unified ICT and ISP providers - With bandwidth becoming a commodity and profit margins narrowing, traditional ISP providers are currently investing in unified ICT services on top of connectivity. Through direct liaising with JANET(UK), the UK's national research and education network provider, results from this work will enable ISPs to exploit programmable and self-managed service provisioning, while rolling out and expanding next generation unified ICT infrastructures for the next 5-10 years.

Equipment and software vendors - Results from this work will shed new light in the instrumentation capabilities of network and server architectures with always-on measurement and control functionality. Software prototypes resulting from this project will be of immediate benefit to equipment and software vendors, especially the virtualisation, OpenFlow, and programmable switch manufacturer industries (e.g., NetFPGA, Solarflare).

End users and corporate customers - The proposed research will result in improving the performance, dependability and predictability of outsourced ICT, benefiting end-users but also corporate customers of Cloud services, such as the private sector and the government. Facilitating the cost-effective and wide-spread adoption of such services will have an immediate and long-term indirect societal and economic impact.

The RA working on this project will develop research and development skills in cutting-edge networking and system technologies, while working on a timely research topic with growing international popularity.
Indirectly, this research will have a long-term impact on the design of future unified distributed systems architectures that will integrate always-on measurement and control functionality, enabling truly extensible, dynamically adaptive and self-managed services. This will lead to novel business models regarding the manufacturing, usage and charging for ICT services in the future.
 
Description We have developed a converged resource management and control framework for Cloud Computing environments. This includes mechanisms to manage bandwidth, processing, and storage resources in a synergistic manner that optimises overall resource usage and imporves applicaiton performance and infrastructure-wide, usable capacity.
We have discovered that we can improve application performance by up to 95%, while increasing the capacity headroom of networked infrastructures by up to 20% through improving resource utilisation.
Exploitation Route Our findings can be adopted by Cloud and Data Centre providers to increase return-on-investment on their infrastuctures and generate more revenue through value-added services.
They can also provide significant influence on novel programmable technologies developed and deployed by (network) equipment vendors and application providers.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description We have published (with additional few publications pending) key findings in international conferences and journals. We are liaising with out business partners (Cloud/Data Centre providers) and communicating the technological and business implications of our findings to them with a view towards adoption of our mechanisms over operational infrastructures. We have also engaged with the wider research policy community in the UK, and used our current findings as evidence of promise for future research and experimentation on programmable mechanisms for the provisioning of next generation networked environments.
First Year Of Impact 2014
Sector Digital/Communication/Information Technologies (including Software)
 
Description Speaker at the EPSRC Communications Network
Geographic Reach National 
Policy Influence Type Participation in advisory committee
Impact Shaping the communicaitons networking future research agenda
 
Description facilitator at EPSRC ComNet2 launch event
Geographic Reach National 
Policy Influence Type Participation in a national consultation
 
Description A Situation-Aware Infromation Infrastructure
Amount £948,000 (GBP)
Funding ID EP/L026015/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 02/2015 
End 08/2017
 
Description University of Glasgow small equipment grant
Amount £20,000 (GBP)
Organisation University of Glasgow 
Sector Academic/University
Country United Kingdom
Start 12/2013 
End 01/2014
 
Title Programmable Switch (Openflow) Fabric 
Description Programmable switch software for flexible packet matching and flow admission control 
Type Of Material Technology assay or reagent 
Year Produced 2015 
Provided To Others? Yes  
Impact The work breaks new ground in enabling a truly programmable dataplane for next generation networks that can facilitate fast service deployment on the datapath of communications networks (e.g., for security, admission control services) 
URL https://netlab.dcs.gla.ac.uk/projects/openflow-bpf
 
Description Glasgow - JANET(UK) 
Organisation JANET UK
Country United Kingdom 
Sector Public 
PI Contribution Informed Janet, as a national network and application service provider, of dynamic, performance-based provisioning of service infrastructures
Collaborator Contribution Communicated the requirements and characteristics of nation-wide, mission-critical network and service provisioning environments.
Impact Articles reporting on optimisation in networked-wide resource utilisation, and on application performance enhancement over Cloud computing and service provisioning environments.
Start Year 2014
 
Description Glasgow - MSR 
Organisation Microsoft Research
Department Microsoft Research Cambridge
Country United Kingdom 
Sector Private 
PI Contribution Informing partner of research outcomes in the areas of Cloud Data Centre provisioning and optimisation
Collaborator Contribution Informing partner of research outcomes in the areas of Cloud Data Centre provisioning and optimisation
Impact Articles reporting on optimisation in networked-wide resource utilisation, and on application performance enhancement over Cloud Data Centres.
Start Year 2014
 
Description Glasgow - Onyx 
Organisation Onyx Environmental Group Plc
Country United Kingdom 
Sector Private 
PI Contribution Informed Onyx, as a Data Centre and Cloud service provider, of dynamic, performance-based provisioning of Cloud infrastructures
Collaborator Contribution Communicated the requirements and characteristics of operational Data Centre and Cloud Computing environments.
Impact Articles reporting on optimisation in networked-wide resource utilisation, and on application performance enhancement over Cloud Data Centres.
Start Year 2014
 
Title Glasgow Network Function Virtualisation 
Description A method (and software infrastructure) to flexibly deploy network functions over virualised and collocation Cloud computing environments 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact New functionality for the Docker (container virtualisation) software stack 
URL https://netlab.dcs.gla.ac.uk/projects/glasgow-network-functions
 
Description Invited Talk at Solarflare Industry Summit, London (17 September 2014) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact talk sparked questions and discussion afterwards regarding how industry should approach hardware reconfigurability and service programmability

I was asked back to repeat my talk to future industry summits in the US.
Year(s) Of Engagement Activity 2014
 
Description Invited talk at National Centre for Scientific REsearch (NCSR) Demokritos, Athnes, GR, 18/12/2015 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact about 50 researchers and professors attended this invited talk to hear about the outcomes and scientific results of the IMC^2 project on the adaptive provisioning of Cloud data centre infrastructures
Year(s) Of Engagement Activity 2015