SYNC: Synergistic Network Policy Management for Cloud Data Centres

Lead Research Organisation: Liverpool John Moores University
Department Name: Computing and Mathematical Sciences

Abstract

All computer networks, including cloud data centre networks, are governed by high-level policies derived from network-wide requirements, such as "file servers should only be accessible by internal IP (Internet Protocol) addresses". Upon deployment, an individual policy is realised as a composition (or chain) of network packets processing rules that will be placed in a specified sequence of network function boxes within the network.

Traditionally, implementing network policy is an error-prone manual configuration process. Emerging technologies such as Software-Defined Networking (SDN) and Network Function Virtualisation (NFV) have largely eliminated the need of manual configuration through software automation. Nevertheless, the use of SDN and NFV have resulted in greater number of independent network nodes that dynamically generate and implement policy rules respectively, making correct policy implementation a hard problem to solve. Worse still, this problem is amplified by the dynamic virtual machine (VM) consolidation in cloud data centre since migrating VMs means that the "specified sequence" must also be updated across the network. Imperfect policy implementation will lead to policy violation that attributes to 78% of data centre downtime, which costs $5,600(£3,758) per minute.

This demonstrates the necessity of synergistic placement of network policy rules and application VMs, as captured in the SYNC project hypothesis: Infrastructural configuration and utilisation as well as application performance of cloud data centres is largely dictated by the placement of network policy rules and application virtual machines. This is in contrast to the existing approaches which either only consider static rules placement, or perform dynamic placement without taking into account the application VMs.

In this project, we propose the development of SYNC, a synergistic network policy management framework that will lever synergy amongst a) policy rules, b) applications and c) underlying temporal network state for achieving network-wide performance optimisation. In order to realise SYNC, the following research and development tasks will take place:
i). The high-level network policy expressions will be decomposed into minimum set of network-wide consistent chains of rules, which are in turn implemented in network function boxes at different network locations, e.g. middleboxes, network switches, and end hosts.
ii). The underlying network state will be exploited to (re)arrange application virtual machines and rules so that the network-wide impact of pairwise traffic patterns is minimised.

The key challenge in this innovation will be the scale of the underlying infrastructure, which can have up to a million VMs and millions of rules. We will construct appropriate models and efficient algorithms, combined with SDN and NFV overcome this challenge.

We intend to publish research outcome in prestigious journals and conferences, provide open-access to the research data, and commercialise our intellectual property.

Planned Impact

EPSRC's Towards an Intelligent Information Infrastructure (TI3) emphasises on smarter computing architectures, sustainable networks and secure storage solutions in order to resolve telecommunications bottlenecks in the presence of the future digital society. The research described here will contribute to the development of intelligent communications infrastructures.

Short term benefit within the next 3-10 years include direct interest from:
1). UK data centres: Data centres are now a critical part of the ICT infrastructure, vital to economic growth and productivity. The UK is a major user and operator of data centres and is home to 239 data centre (http://www.datacentermap.com/united-kingdom/). The largest data centre in Europe is based in Newport, Wales. Also, Microsoft and Amazon are reportedly to build data centres in the UK (http://goo.gl/H6iUvO). Our SYNC framework will have an immediate impact on improving the reliability and performance of these cloud infrastructures that users will increasingly access in a fully transparent manner. Improving service delivery of data centre networks will have a significant impact on a) the accessibility of the global infrastructure since it will underpin the development of novel applications and, b) charging/pricing models for Cloud Computing.

2). Private sector IT companies: The research described in this proposal contributes directly to the Digital economy theme (ICT Network and Distributed Systems; Cloud Computing) of the EPSRC portfolio. The work is primarily aligned with the IT as a Utility challenge area. We will enable technology that gives UK private sector companies a competitive advantage in the area of ever growing cloud computing. Small and medium enterprises, such as our project partner BrightOffice, can benefit directly via embedding our research in their products, or advancing on our research methodologies during their research and development cycles.

3). Our researchers and project students: Postdoctoral researcher and project student directly involved in this project will learn transferable skills, gain experiences, forge collaborations, in preparation for the job market. To further generate a pipeline of researchers and professional engineers with a strong background in data centre networking technologies, we will engage undergraduate and masters level students with the project.

Long term benefit within the next 10-50 years include direct interest from:
1). Government and public sectors: Many of which use private clouds and will rely on highly available information networks. Our techniques will give greater confidence on the network resiliency and audibility of established networking requirements in shared public cloud infrastructure. This will enable companies and government to move more of their critical infrastructure onto the cloud to exploit the economy of scale. For example NHS could speedup modernisation via hosting computation and huge volume of data to one or more of the UK's data centres.

2). The UK economy: Our research will have direct impact on new economic models for sustainable IT. Alongside mega data centres, another branch of cloud development is moving towards more distributed but federated manner. Being able to provide flexibility in network policies/functions management in multi-sovereignty federated cloud settings will be equally, if not more, important. Advance building upon our research will improve both of performance and lifetime of mission-critical data centre network and compute infrastructures. Hence the UK economy will benefit from lowered cloud computing costs and increased adoption.

3). UK citizens: The resilient cloud infrastructures will underpins future development of Internet of things, smart cities, e-governance, etc., which promotes digital living.
 
Description We have surveyed a wide range of common network functions (NFs, e.g. firewall, load balancers, etc.) and service chains and performed extensive experimental analysis to understand their common behaviours and properties. Most of these NFs perform limited types of processing on packets, e.g., watching flows but making no modification, changing packet headers and/or payload. For example, in the simplest case, a flow monitor (FlowMon) obtains operational visibility into the network to characterise network and application performance, and it never modify packet and flows. Some NFs, e.g., IDS, will check packet headers and payload, and raise alerts to the system administrator. Some NFs (such as firewalls and IPS) do not change packet headers and payload, but they use packet header information to make decision on whether to drop the packet or forward it. Some NFs (such as NAT and LB) may check IP/port fields in packet headers and rewrite these fields. Others (such as traffic shaper) do not modify packet headers and payloads, but may perform traffic shaping tasks such as active queue management or rate limiting.

For a service chain, certain ordering requirement of NFs naturally exists due to the nature of the functions applied. For instance, for a service chain applied to North-South traffic in data centres, a Web Optimisation Control (WOC) is not effective on VPN (virtual private network) traffic, requiring VPN termination prior to WOC. For other service chain with IDS and FlowMon, since IDS never change the packet content, FlowMon can be applied to the traffic after IDS or placed prior to IDS. Hence, our key finding is that If the order of some NFs in a service chain is allowed to be re-organised, there could be more opportunities to improve performance by reducing the length of the service chain path. We have exploited this finding and devised and implemented an efficient optimisation scheme that can significantly reduce the network latency within data centres whilst strictly adhere to data centre network policy requirements.

Together with the heterogeneity of network function boxes in data centre (i.e. they can be either software or hardware based, and can be put at various locations in a network), we have discovered and proposed algorithms to:
1. Dynamic latency-aware service chain composition: We dynamically create virtual machines that match the computation resource properties of virtual network functions. This means computation demanding network functions get VMs that have more CPU allocated whereas memory demanding network functions get more RAM.
2. Heterogeneous network function chaining: Our experimental results demonstrate that our algorithm can achieve the same optimality as Branch-and-bound optimisation (brute force algorithm) but is 3 orders of magnitude more efficient.
Exploitation Route Non-academic routes: (1) This key finding is particularly useful for data centre operators as they may use the key findings in their production environment to improve resource utilisation and network resilience, hence improving their return-on-investment. (2) alternatively part of our software code (currently open source) can be licensed to other open source / commercial software. Academic routes: (1) We have published two papers (and two journal submissions are under review) in leading networking conferences (IEEE/IFIP IM 2017 and IEEE CCNC 2018) on our key findings. Our findings have created a new research area that will attract follow-up work done by other researchers. (2) We are also organising an international workshop (TOPIC 2018) in this area in conjunction with ACM PODC 2018. We are going to present our key findings in order to gather wider interests from international research community.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://sync-project.com/
 
Description As a part of an UK industry lead consortium we are applying out research outcome in cross cloud and edge resource management for an Innovate UK funded project.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Title A NS-3 simulator for simulating policy-based resource management in cloud data centres 
Description This simulation is used to underpin research carried out the SYNC project. We chose ns-3 due to its popularity and efficiency for simulating computer networks. On top of the existing ns-3 network simulator we added the capability of network policy and middle box generation and simulation. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact The research tool underpins some research outputs (in top venues) generated from this project. 
URL https://bitbucket.org/posco/sync-simulation
 
Description Giving a Talk for University of Macau 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Over 50 PhD students and academic attended this talk, which sparked questions and discussion afterwards. A potential collaboration with Macau University was explored.
Year(s) Of Engagement Activity 2019
URL https://www.um.edu.mo/news-centre/news-and-events/event-calendar/print-event-calendar-item/48788/
 
Description Theory and Practice for Integrated Cloud, Fog and Edge Computing Paradigms - TOPIC 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact We organised TOPIC workshop in conjunction with ACM PODC 2018. The workshop attracted 30 international attendees. The theme of the workshop is to promote the SYNC project's underlying research hypothesis: only through jointly managing network policy the network resource can be effectively provisioned.
Year(s) Of Engagement Activity 2018
URL https://synnetsys.github.io/topic2018/