Networks as a Service

Lead Research Organisation: University of Nottingham
Department Name: School of Computer Science

Abstract

Cloud computing has significantly changed the IT landscape. Today it is possible for small companies or even single individuals to access virtually unlimited resources in large data centres (DCs) for running computationally demanding tasks. This has triggered the rise of "big data" applications, which operate on large amounts of data. These include traditional batch-oriented applications, such as data mining, data indexing, log collection and analysis, and scientific applications, as well as real-time stream processing, web search and advertising.

To support big data applications, parallel processing systems, such as MapReduce, adopt a partition/aggregate model: a large input data set is distributed over many servers, and each server processes a share of the data. Locally generated intermediate results must then be aggregated to obtain the final result.

An open challenge of the partition/aggregate model is that it results in high contention for network resources in DCs when a large amount of data traffic is exchanged between servers. Facebook reports that, for 26% of processing tasks, network transfers are responsible for more than 50% of the execution time. This is consistent with other studies, showing that the network is often the bottleneck in big data applications.

Improving the performance of such network-bound applications in DCs has attracted much interest from the research community. A class of solutions focuses on reducing bandwidth usage by employing overlay networks to distribute data and to perform partial aggregation. However, this requires applications to reverse-engineer the physical network topology to optimise the layout of overlay networks. Even with perfect knowledge of the physical topology, there are still fundamental inefficiencies: e.g. any logical topology with a server fan-out higher than one cannot be mapped optimally to the physical network if servers have only a single network interface.

Other proposals increase network bandwidth through more complex topologies or higher-capacity networks. New topologies and network over-provisioning, however, increase the DC operational and capital expenditures-up to 5 times according to some estimates-which directly impacts tenant costs. For example, Amazon AWS recently introduced Cluster Compute instances with full-bisection 10 Gbps bandwidth, with an hourly cost of 16 times the default.

In contrast, we argue that the problem can be solved more effectively by providing DC tenants with efficient, easy and safe control of network operations. Instead of over-provisioning, we focus on optimising network traffic by exploiting application-specific knowledge. We term this approach "network-as-a-service" (NaaS) because it allows tenants to customise the service that they receive from the network.
NaaS-enabled tenants can deploy custom routing protocols, including multicast services or anycast/incast protocols, as well as more sophisticated mechanisms, such as content-based routing and content-centric networking.

By modifying the content of packets on-path, they can efficiently implement advanced, application-specific network services, such as in-network data aggregation and smart caching. Parallel processing systems such as MapReduce would greatly benefit because data can be aggregated on-path, thus reducing execution times. Key-value stores (e.g. memcached) can improve their performance by caching popular keys within the network, which decreases latency and bandwidth usage compared to end-host-only deployments.

The NaaS model has the potential to revolutionise current cloud computing offerings by increasing the performance of tenants' applications -through efficient in-network processing- while reducing development complexity. It aims to combine distributed computation and network communication in a single, coherent abstraction, providing a significant step towards the vision of "the DC is the computer".

Publications

10 25 50

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/K031724/1 01/01/2014 31/12/2014 £553,973
EP/K031724/2 Transfer EP/K031724/1 19/01/2015 18/01/2018 £479,669
 
Title Irmin backend for Jitsu 
Description Addition of a new mechanisms to the Jitsu opensource project, allowing management of unikernels via the Irmin git-like database and to target the XAPI backend 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2015 
Impact n/a 
URL https://github.com/mirage/jitsu/pull/16
 
Title Mirage OpenFlow 
Description OpenFlow stack utilising Frenetic implementation, suitable for use with Mirage. Includes implementation of an OpenFlow software switch. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact This is being used by the project in future developments 
URL https://github.com/NaaS/ocaml-openflow