COMIT: Active Content Management at Internet Scale

Lead Research Organisation: University College London
Department Name: Electronic and Electrical Engineering

Abstract

The Internet is currently passively pushing bits between end-host machines, be it servers, fixed or mobile user devices, or sensors. The network does not "understand" what is being transferred, i.e., it is not content-aware. This agnostic mode of operation affects several of its key functionalities, for example, efficient content distribution and content-aware traffic engineering. As a result, the network is not able to cope well with the exponentially increasing amounts of multimedia content access which constitute the major mode of use in recent years. Fixed and mobile network providers keep continuously upgrading their infrastructures but the situation has become unsustainable due to their eroding profit margins. There is an urgent need to rethink traffic management under the umbrella of active content management, rather than passive content transfer, allowing ISPs to control traffic better and achieve a sustainable model for the long-term evolution of their networks. New approaches that maximise traffic localisation are essential for long-term global network sustainability.

In this context, Information-Centric Networking (ICN) has emerged as an alternative to the current host-to-host communication paradigm and proposes direct communication between user applications and the content itself, putting the actual information or content in the forefront and disregarding location. In ICN, the network transfers individual, identifiable content chunks, instead of data containers, i.e. packets, with opaque data. Contents are identified by name and relevant packets contain a part of a content chunk; the latter can be retrieved from the hosting server or from an in-network router cache, given that in-network caching is a key aspect of the ICN paradigm. Popular content tends to stay longer in network caches and "anycast routing" based on content names retrieves the closest copy to the user. This increases dramatically traffic localisation, avoids flash crowd effects and gives to network providers control over the information transferred, allowing them to engineer their networks based on the actual demand for named content.

Despite the considerable amount of effort that has been invested to date by the research community in location-independent routing based on content names, a widely acceptable and scalable solution is yet to be found. Any naming scheme would have to be able to accommodate 10**12 or more objects and content resolution and routing based solely on content names raises serious scalability concerns. In addition, the current IP-based Internet represents a massive infrastructure that cannot be easily replaced by a new, clean slate design. Having this in mind, and given our considerable research experience in the ICN area, we believe it is possible to achieve the ICN benefits of traffic localisation and sustainable network evolution without radical ICN approaches but by introducing, in an evolutionary manner, a "content layer" in the Internet architecture which will operate above the current network layer and below the transport layer, i.e. layer 3.5.

This layer will intercept communication, will produce unique location-independent names for requested content and will store the latter within the network according to sophisticated caching policies. Content will be accessed in an anycast fashion using ICN style of operation but overlaid over IP, exploiting the existence of scalable IP-based routing, maintaining full backwards compatibility and protecting current investment. In addition, congestion control will be dealt with in a hop-by-hop rather than an end-to-end basis within the content layer, maintaining at the same time compatibility with current end-to-end operation while maximizing the use of available network resources, increasing user quality of experience and paving the way for future Internet applications with stringent real-time requirements.

Planned Impact

The emergence of Content Distribution Networks (CDNs) and of relevant market players served well the purpose of bringing (static) content closer to the user. However, modern applications (as well as anticipated highly-interactive future forms of communication) do not simply call for storage-localisation of static content only, but instead for more efficient communication platforms that will be able to support high-volumes of real-time streaming in conjunction with interactive traffic and other demanding applications.

At present, ISPs operate as passive "bit pushers", struggling to keep their networks running by throwing more and more capacity or by deploying their own CDN infrastructures. Backwards compatibility requirements and homogeneity among technologies used in the Internet has led to the ossification of the architecture and has set a ceiling on the innovation allowed. Given the observed exponential traffic increase, the situation is going to simply become unsustainable. Mobile data traffic is expected to increase 26-fold by 2015, according to Cisco's predictions, with not only fixed and mobile usage replacing TV and radio but also through other, yet unforeseen, applications. Bottlenecks are already appearing in the backhauls of cellular networks (e.g., the O2 London outage in 2009) due to the skyrocketing mobile data traffic.

There is an urgent need to rethink traffic management under the umbrella of active content management, rather than passive content transfer, allowing ISPs to control traffic better and achieve a sustainable model for the long-term evolution of their networks.

Timescale of projected impact: Our research is targeting the period of the next 5-20 years. Within this period, we envision an explosion in the number of Internet endpoints, with everything getting connected to the Internet and resulting to an Internet of Things. Those "things" will include a variety of devices, from home appliances to vehicles to small healthcare devices to help the elderly. Given the increasingly stringent requirements of applications, traffic will need to be identified, classified and potentially prioritised in a well-engineered network. The proposed approach and the related protocols will pave the way for the transition to content-aware traffic engineering for ISPs, which is now becoming a necessity rather than an option in order to guarantee sustainable network evolution. This project will build the enabling technology and hence, the path to the establishment of such infrastructures and to the expected revolution of the related applications.

Below, we provide a list of who we expect our research to benefit and how.

* ISPs: The active content management approach proposed here has the potential to give a more active role to ISPs by giving them the opportunity to actively manage the content they are transferring, rather than simply adjusting storage and bandwidth resources (and therefore, investment) according to expected demand.

* Academic community: The academic community has been looking into ways of bringing the ICN benefits to the attention of the industrial players. However, given that no viable pathway has been found yet, the ICN research field is running the risk of being considered unsuccessful and, eventually, become obsolete. The proposed viable approach for realising the benefits of ICN, and of content-aware networking in general, will pave the way to new related fields of research.

* Society: This shift to active content management will open up the road to innovation in terms of application design. The Internet architecture will become more sustainable and able to foster new application platforms and therefore, will contribute to job creation and economic prosperity. Implementation, marketing and operation of real-time, interactive applications both in the fixed and in the mobile world will not be a taboo discussion among Internet researchers and engineers but will become a reality.
 
Description The research focus of the Information-Centric Networking (ICN) community has lately shifted from the architectural design to the technical and engineering problems that need to be solved in order for the architecture to start seeing deployment. As such, and in order to be in the forefront of developments in the area, we have shifted our attention to the hottest open questions. That said, we have made the following contributions in the area within the lifetime of the project.

1) We have adapted our previously proposed hash-routing scheme (ACM ICN 2013, 158 citations to date) to fit into realistic environments and large ISPs. The initial version of the mechanism, presented in the ACM Sigcomm ICN Workshop 2013, did not take into account the size of an ISP network. We have extended our initial proposal to fit into big ISP networks comprising hundreds of nodes. Our newly proposed node-clustering algorithm has been evaluated both with simulations and analytically (Elsevier Computer Networks 2016 journal paper).

2) In the same line of work, we have identified the problem of load-imbalance between caches of the same rack, servers of a server farm, or caches of an operational network. Although the problem is well-known, we have pointed to potential solutions and have mathematically proved the bounds of imbalance as well as the degree to which imbalance can be mitigated using a number of techniques. Our work has been accepted in the highly-selective IEEE INFOCOM'2016 conference (18.2% acceptance rate) and a journal version was published in IEEE Transactions on Networking in 2020.

3) Given the sophistication of network routers under an ICN forwarding and caching paradigm, researchers have investigated the energy efficiency of such a network. There is a trade-off between caching and therefore, reducing the amount of traffic travelling upwards in the network and the energy needed to perform caching operations. We have implemented a software router and have measured the energy consumption under realistic workloads. We have built a model for the energy consumption of an ICN router, which we hope will be useful for the community and will be used in similar studies (IEEE JSAC 2016 journal publication).

4) Inline with our project target of a deployable ICN architecture in operator networks, we have extended our previously proposed hash-routing architecture for operator-managed content caching. We have mathematically assessed its behaviour and proved theorems that precisely describe the performance of the network in terms of latency and cache-hit rate and the work was published in IEEE Journal of Network and Service Management in 2020.

5) The Information- or Content-Centric Networking paradigm was originally proposed as a paradigm shift to transform the Internet from a communications system to a native content distribution network. However, given the scalability (but not only) challenges of routing and forwarding in ICN architectural proposals, most architectures have focused on scaling content resolution towards the main origin (or CDN surrogate server). That said, all current architectural proposals, including the most prevalent one (i.e., NDN) focus on the optimisation of "how to route requests towards the core of the network". We argue that this goes against the original vision of a "native content distribution network" and prevents requests from discovering nearby content, unless the content is on the shortest path to the core of the network.

With a view to improving the content discovery capabilities of the NDN architecture, we have proposed an enhancement to the routing fabric, which keeps track of successful (i.e., served) content requests in a separate routing table, called "Downstream FIB" (D-FIB). D-FIB effectively acts as a FIB table that points to downstream nodes that have recently received (and therefore, cached) requested content. Our paper was published in the 22nd IEEE LANMAN 2016 symposium, where it also received the Best Paper Award.

6) As the topic of in-network caching in ICN gained momentum, our team has identified the need for a simple, open-source simulation tool to evaluate the performance of in-network caching algorithms. Our simulation tool "Icarus" has been designed and implemented within the context of the COMIT project and has been made open-source to the community. It has attracted wide attention and has become one of the reference simulation tools in the area (Simutools'2014 conference paper that has received 125 citations to date - see also Narrative Impact Software and Technical Products).

7) As mentioned earlier, the resolution, routing and forwarding scalability of "route-by-name" techniques at the inter-domain level is one of the most challenging issues to be dealt with in the ICN area. Given that the research community has not concluded on this topic as yet and is still investigating the validity of different options, we have carried out a detailed analysis of the scalability properties of two key architectures, namely DONA (proposed in 2007 by T. Koponen et. al.) and CURLING (proposed by our team in 2011). Based on real topologies and traffic traces, we found that the improved properties of CURLING make it a viable choice, in terms of scalability, for ICN networks (IFIP Networking'2015 conference paper).

8) One of the main objectives of the COMIT project has been to redefine the transport-layer protocol properties for Information-Centric Networking environments. As a first step towards this direction, we have carried out an analysis of a large data set to identify the rate regulation and limitation factors of TCP. Our results show that more than half of the traffic we analysed is throttled by constraints beyond network capacity (Elsevier Computer Networks 2017 journal paper). These findings have in turn influenced the design of our In-Network Resource Pooling protocol.

9) The placement of content in in-network content caches has a direct impact on the performance of the cache system. Carefully placed content can significantly increase cache hit performance, but at the same time might increase signalling and communication overhead, in order to make informed decisions on both content placement but also content retrieval. We have investigated the tradeoff between cache hit performance increase and management overhead for ISP-operated content distribution networks [9]. We have proposed novel domain clustering techniques in order to limit management overhead but at the same time provide the same level of quality of service in terms of content delivery time. Our results demonstrate that indeed, by splitting domains in smaller clusters we manage to limit the overhead required to manage such a cache system without a decrease in cache hits (IEEE JSAC 2016 journal paper).

10) The ICN paradigm is advocating the use of explicitly named content to foster resolution and routing, but also in-network caching and forwarding. The components, structure and design of namespaces provide an opportunity to add (in a sense encode) unique features directly in content names. We have done an extensive analysis of naming structures and found that exposure of information through content names can provide a handy tool to include important information directly in the name, which the network can make use of in order to make informed decisions (IEEE Q-ICN 2014 workshop paper).
Exploitation Route The models developed in 2) and 3) above have been used by the research community as they give precise bounds on the performance of caching systems.

The "Icarus" simulator is being used by many researchers and research labs worldwide and has becoming one of the mainstream simulators in the area. We have now created a mailing list for the Icarus simulator, which we have advertised to the community. The mailing list now has many members and acts as a reference point for researchers to post queries and information about the simulator. The Icarus simulator has become one of the reference simulators in the community.

On the standardisation front, our work has been presented in several meetings of the Internet Research Task Force (IRTF) Research Group on Information-Centric Networks (ICNRG), where we received valuable feedback from the community. Dr Ioannis Psaras has been contributing to one of the first documents produced by the group, the "ICN Research Challenges" document, which several revisions, has now become RFC 7927.

It should be finally stated that this work has led to an industrial project with Cisco on scalable name-based forwarding in Hybrid ICN (hICN) which started in 2020 and we are looking to transfer some of our findings into Cisco hICN products.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://www.ee.ucl.ac.uk/comit-project/
 
Description Our model for identifying load-imbalance between caches of the same rack, servers of a server farm, or caches of an operational network have been used widely by the research community as they give precise bounds on the performance of caching systems. The relevant model and algorithms are being used today by Content Distribution Network (CDN) providers. Our "Icarus" simulator has been extensively used by other members of the community and relevant labs in order to evaluate in-network caching algorithms. The interest in our simulator is growing and we keep updating the software given it has become the dominant tool of the community for evaluating caching approaches. On the standardisation front, our work has been presented in many meetings of the Internet Research Task Force (IRTF) Research Group of the Information-Centric Networks Research Group (ICNRG). Dr Psaras has also contributed to one of the first documents produced by the group, the "ICN Research Challenges" document, which after several revisions became RFC 7927. Dr Lorenzo Saino, a PhD student trained by the project was headhunted and is now Director of Engineering at Fastly, a CDN company specialising on real-time content (www.fastly.com). He has reported that they have already made use of the results and findings of the project and in particular of those in our IEEE INFOCOM'2016 paper (see also subsesction 2 in Key Findings). This work is very related to Dr Saino's new role at Fastly and aspects of the approach have been deployed in the company's caching systems, see https://www.fastly.com/products/load-balancing/ . It should be finally stated that this work has led to an industrial project with Cisco on scalable name-based forwarding in Hybrid ICN (hICN) which started in 2021 and we are in the process of transferring some of our findings to Cisco hICN products.
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic

 
Description ICN2020: Advancing ICN Towards Real-World Deployment Through Research, Innovative Applications, And Global Scale Experimentation
Amount € 1,300,000 (EUR)
Funding ID 723014 
Organisation European Commission 
Department Horizon 2020
Sector Public
Country European Union (EU)
Start 07/2016 
End 06/2019
 
Description INSP: The business and technical case for In-Network Service Providers
Amount £972,417 (GBP)
Funding ID EP/M003787/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2015 
End 03/2020
 
Description INTENT: Information-Centric Network Management and Traffic Engineering
Amount € 221,606 (EUR)
Funding ID 628360 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 01/2015 
End 12/2016
 
Title Icarus: a Caching Simulator for Information Centric Networking 
Description As the topic of in-network caching in Information-Centric Networking (ICN) and more generally gained momentum, our team has identified the need for a simple, open-source simulation tool to evaluate the performance of in-network caching algorithms. Our simulation tool "Icarus" has been designed and implemented within the context of the COMIT project and has been made open-source to the community. It has attracted wide attention (the relevant conference paper has been highly cited) and has become one of the reference simulation tools in the area. We have created a mailing list for the Icarus simulator, which we have advertised to the community. The mailing list many members and acts as a reference point for researchers to post queries and information about the simulator. The Icarus simulator has become one of the reference simulators in the community. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The Icarus simulator has become one of the reference simulators in the community, enhancing further the academic reputation of our group as leaders in the ICN and content-caching area. 
URL https://icarus-sim.github.io/
 
Description Internet Engineering Task Force (IETF) and Internet Research Task Force - Information-Centric Networking Research Group (ICNRG) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact The Internet Research Task Force (IRTF) is the research "arm" of the Internet Engineering Task Force (IETF), the main standardisation body of Internet protocols and related activities. The Information-Centric Networking Research Group (ICNRG) has been the main group promoting activities related to the ICN paradigm. The PI is actively engaging with the group since its early days. He has contributed to one of the first RFCs of the group "RFC 7927: ICN Research Challenges". The group is meeting four times a year. We have been active in most of the meetings of the group and have presented our results which have triggered extensive discussions in the community. This way, we promoted our work and created visibility around our group and the activities in the project. We will be hosting the next interim meeting of the group during the London IETF in March 2018. The interim meeting of the ICRNG will take place at UCL.
Year(s) Of Engagement Activity 2016,2017,2018
URL https://trac.ietf.org/trac/irtf/wiki/icnrg
 
Description RFC 7927 - Internet Engineering Task Force (IETF) and Internet Research Task Force - Information-Centric Networking Research Group (ICNRG) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The Internet Research Task Force (IRTF) is the research "arm" of the Internet Engineering Task Force (IETF), the main standardisation body of Internet protocols and related activities. The Information-Centric Networking Research Group (ICNRG) has been the main group promoting activities related to the ICN paradigm. The PI is actively engaging with the group since its early days. He has contributed to one of the first RFCs of the group "RFC 7927: ICN Research Challenges" and is regularly presenting the progress of this project to the ICNRG group.
Year(s) Of Engagement Activity 2016,2017,2018,2019,2020
URL https://trac.tools.ietf.org/html/rfc7927