CD-GAIN: Content Delivery Using Graph-based Analysis of Interest Networks

Lead Research Organisation: King's College London
Department Name: Institute of Telecommunications

Abstract

Recent years have seen a sea change in the Internet traffic mix, with the Web moving from primarily text-based content towards rich-media such as video and audio streaming. This has imposed significant additional costs both on content providers as well as on Internet Service Providers (ISPs).

This project proposes a novel approach that saves costs by identifying and actively enhancing ISP-local content availability. The core idea is that requests for content items can be served from ISP-local copies, where one is available. ISP-local access creates a synergy, with content providers saving on streaming costs, ISPs decreasing their cross-border transit traffic, and users obtaining local copies, isolating them from effects beyond the ISP such as packet losses, route failures or congestion in the network core.

Interest-based social networks, which are increasingly available on many content provider sites, provide an ideal framework to engineer the availability of ISP-local copies. Using the interest-based network, communities of users who are interested in similar items can be identified within each ISP. Such communities can serve as repositories of ISP-local copies of items they are interested in, and can support different access rates by pushing additional copies of items among themselves as required. Because members of the community have a high affinity for items they are responsible for, they might have a local copy available, having accessed the item in the past. Alternately, they might be likely to access the item in the future, so if a copy is pushed to them, the expected overhead of the push could be balanced against the future access. Methods will be developed to identify high quality communities useful for sharing content, by weighting or ranking links based on shared content consumption, predicting additional ISP-local links that have not yet been self-identified by users, and adapting community detection methods to create multi-resolution communities with "core" and "peripheral" members, to support fewer or additional ISP-local copies, as required, to sustain different rates of simultaneous content access from the ISP's userbase.

The project will take a data-driven approach, using extensive real-world traces from leading content providers both to derive new patterns in the interest network that can be exploited for content delivery, and also to evaluate the benefits of proposed content delivery architectures under realistic workloads. In analysing the traces, the project will develop new characterisations of interest-based social networks using content consumption histories and chart ISP-level content availability from a content provider's viewpoint, both of which would be of independent academic and commercial value.

Planned Impact

The project will directly benefit two sectors of the UK economy: The content provider industry and Internet Service Providers. In addition, "Big Data" experience gained by the Research Associate working on large real-world data creates a skill set useful in many scenarios.

1. The content provider industry will be a primary beneficiary of this project. The benefits are two fold: improved analytics, and lower operational costs for content delivery.
1.a - Analytics: By expanding our understanding of content consumption patterns, the project will enable new analytics to be developed, which can be used to increase relevance of content to consumers. Further, insights from the social network could also lead to new ways for enhancing user engagement and retention.
1.b - Lower cost content delivery: The ISP-local content reuse architecture, a main output of this project, will improve scalability of operations and decrease streaming bandwidth costs (and/or the carbon footprint of content delivery) for content providers by converting direct access from the content provider servers to peer-to-peer traffic between users.

2. The ISP industry will benefit indirectly, but massively, if the ISP-local content reuse architecture proposed in this project gets deployed by major content providers. In addition, our studies of ISP-local content availability and connectivity patterns of users could lead to new economic models for ISPs: For instance, studying the cross-ISP roaming behaviour of users (e.g. users having a mobile phone with connectivity via both 3G and fixed line ISPs, or a laptop connected to one ISP at work and another at home) may lead to new pricing models for such users, as they now have a choice of more than one ISP to access content from. As another example, a large potential for directly sharing content among users may indicate the need for new peering arrangements between consumer facing ISPs.

3. Lastly, the Research Associate working on the large data sets of the project will gain experience in handling "Big Data", which is becoming an increasingly valuable skill, and can be applied in various employment sectors.

Publications

10 25 50
 
Description Our contributions are in two areas: content delivery and social network/community engagement around pieces of content that are interesting to communities of users.

1. Content Delivery: Working on of nearly 2 billion accesses over a year to BBC iPlayer, we developed three novel ways of delivering content for catch-up TV players like BBC iPlayer: using peer-assistance, using a broadcast medium such as Digital Terrestrial Television to speculatively offload content, and by predictively preloading content when "good" connectivity is available.

1a. Peer-assistaned content delivery: We have demonstrated, using extensive traces of user access patterns from BBC iPlayer, that peer-to-peer swarming can be effective at decreasing the costs of content delivery by up to 80%, even when traffic is highly localised and peers are matched within a local scope such as a single Internet Service Provider. We have also shown that this translates to a corresponding decrease in the energy footprint of content streaming. [Paper published in IEEE INFOCOM; use case and application of using analytics incorporated into a draft IEEE standard on content delivery]

1b. Speculative Content Offloading and Recording Engine (SCORE). We developed a mechanism for creating a personal offline channel for users by speculatively recording content they are likely to watch on a set-top box. This resulted in a publication in WWW 2013, followed by a journal paper in IEEE/ACM Transactions on Networking. This has been incorporated as a use case in IEEE 1903.1 draft standard on content delivery.

1c. Predictive preloading for mobile phones: Accessing long-form rich media content such as films and TV shows is expensive on cellular networks. We have developed a method for preloading content that is likely to be watched during work commutes by iPlayer users, by considering nearly 450 different factors ranging from the type of the show, and what interests the user, to the UI on iPlayer. Our method saves, on average, over 70% of users' cellular data usage. This has been accepted to be published in IEEE Journal on Selected Areas in Communications.

2. Using data from Last.fm and Pinterest.com, we have demonstrated the importance of social networks for "niche interest" content. [Paper in AAAI ICWSM 2013]. We have also uncovered a process that we term "social bootstrapping" which shows how copying links from a mature social network such as Facebook can be highly beneficial to bootstrap an engaged community on less developed communities such as Pinterest or Last.fm. [Paper in WWW 2014].

3. We have also analysed our anonymised dataset of nationwide accesses to BBC iPlayer to understand and highlight the importance of quality of connectivity to the adoption of a high bandwidth application such as catch-up TV streaming. Our results clearly show that better broadband speeds are important for national infrastructure; that 3G accesses are primarily used during commute times; that data caps and product bundle discounts can affect user choice of connectivity provider as well as the volume and means of access. [2nd Paper in IEEE INFOCOM 2015]
Exploitation Route 1. Our findings on content delivery can be taken forward by content delivery networks (e.g., Akamai) and content providers (e.g., BBC or ITV or BSkyB) to implement highly localised content delivery services.
2. Our findings on social networks and community engagement can be useful to many websites which incorporate a social networking component, and wish to have a committed community of users.
3. We hope our findings on factors affecting iPlayer usage will be taken up by regulators and other authorities to drive forward a coherent connectivity strategy for the UK. It can also inform the pricing strategy of Internet Service Providers and 3G/4G cellular operators.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Financial Services, and Management Consultancy

URL http://www.inf.kcl.ac.uk/staff/nrs/projects/cd-gain/
 
Description Our contributions are threefold. First, we have analysed the energy and traffic costs of different means of delivering content on the Internet. Second, we have developed ways of offloading content, making use of "good" connectivity when it is available. Third, we have analysed social networks formed around interests of users (so-called "interest-based social networks"). 1. Our findings on content delivery are informing how BBC thinks about iPlayer and distribution of "catch-up TV"a; content. They have requested our inputs to make further investigations into specific deployment scenarios they are considering with caches deep in an ISP's network. We are also influencing the wider community. I was recently invited to give a keynote lecture at the IEEE International Symposium on Computers and Communications about this work. 2. We have been active members of the IEEE 1903.1 standard on Next Generation Service Overlay Networks for Content Delivery. Our contribution for incorporating analytics support into content delivery, and use cases from our research illustrating these issues have been accepted into the draft of the standard. 2. Pinterest.com, a rapidly growing interest-based social network, have said our research is "highly interesting" to them. 3. Datasets we have released have been requested by researchers from over 60 researchers at top universities such as Carnegie Mellon, Caltech, University of Southern California, Peking University, IIIT Delhi, Singapore Management University etc. 4. We have made well received demos at Mobile World Congress on mobile edge computing and its importance to 5G
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic

 
Title Pinterest anonymised dataset 
Description An anonymized version of the Pinterest dataset used in our WWW14 and ICWSM13 papers is being made available to the research community. If you are interested in using this data, please send us an email at netsys@kcl-dot-AC-dot-uk to get the link where you can download the data. Note that sending the email indicates that you accept our terms and conditions in the following section. Please indicate which of following parts of the dataset you need in the email. Pinterest network: A snowball sampled social graph of Pinterest, crawled in Apr 2013. Facebook network: The Facebook social graphs of users who appear in the Pinterest activites dataset below. Fb-copied network: The subset of Pinterest network that only contains links common to both Pinterest and Facebook. User information: The basic statistics (such as the number of pins, likes, followees and followers) of Pinterest users. Pinterest activities: Repin and like activity in Pinterest during 03-21 Jan, 2013. Pinterest pins: Basic information (e.g. the source of image) and statistics (e.g. number of repins and likes) of 3.36 million images published during 03-21 Jan, 2013. You can find the format of the dataset from http://www.inf.kcl.ac.uk/staff/nrs/projects/cd-gain/pinterest-data-format.txt 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact This has been requested by over 20 researchers from top Computer Science Departments such as Carnegie Mellon, Caltech, Singapore Management University, University of Southern California, Peking University, IIIT Delhi etc. 
URL http://www.inf.kcl.ac.uk/staff/nrs/projects/cd-gain/dataset.html
 
Description Collaboration with Prof. Krishna Gummadi 
Organisation Max Planck Society
Country Germany 
Sector Charity/Non Profit 
PI Contribution Contributed a dataset and to the idea of evaluating cross-domain trust transfer
Collaborator Contribution Initiated the idea of cross-domain trust transfer
Impact Paper in WWW 2016
Start Year 2014
 
Description Collaboration with Prof. Mia Cha 
Organisation Korea Advanced Institute of Science and Technology (KAIST)
Country Korea, Republic of 
Sector Academic/University 
PI Contribution Collaborated to create a multi layer network of Pinterest and Facebook users, and initiated the concept of social bootstrapping
Collaborator Contribution Collaborated to create a multi layer network of Pinterest and Facebook users, and initiated the concept of social bootstrapping
Impact WWW 2014 paper on social bootstrapping
Start Year 2014
 
Description Contribution to IEEE 1903.1 Standard 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact ~15 people attended the main 3-day workshop of the standardisation working group, conducted at KCL. Several others received the information through email, or by participating through an active phone line which was on through the duration of the workshop.

contributions incorporated into draft standard.
Year(s) Of Engagement Activity 2014
 
Description TV interview about user generated video on the tenth anniversary of YouTube 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Interviewed by Al Jazeerah TV on tenth anniverary of YouTube
Year(s) Of Engagement Activity 2015
URL https://www.facebook.com/aljazeera/videos/10153184081623690
 
Description Talk at UK Network Operators Forum (UKNOF) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Postdoc Dmytro Karamshuk attended and presented our work at the UK Network Operators forum in 2014 and 2015.
Year(s) Of Engagement Activity 2014,2015
 
Description Talk to Cambridge Wireless about user analytics in edge caching 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact At Cambridge Wireless event on Mobile Edge Computing on The role of user analytics in edge caching: Lessons from BBC iPlayer: http://www.cambridgewireless.co.uk/Presentation/VN-03.11.15-Nishanth_Sastry-KCL.pdf
Year(s) Of Engagement Activity 2015
URL http://www.cambridgewireless.co.uk/crmapp/EventResource.aspx?objid=47940
 
Description press release 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact worked with KCL press release team to make sure it is technically accurate.

featured on KCL news page.
Year(s) Of Engagement Activity 2014
URL http://www.kcl.ac.uk/newsevents/news/newsrecords/2014/April/Social-media-bootstrapping-key-for-growt...
 
Description press release by KCL on our work 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Worked with KCL team to make the press release technically accurate.

featured on KCL home page.
Year(s) Of Engagement Activity 2013
URL http://www.kcl.ac.uk/newsevents/news/newsrecords/2013/06-June/Intelligent-and-green-iPlayer-records-...