DISSP: Dependable Internet-Scale Stream Processing

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

Real-time stream data has begun to play an increasingly important role on the Internet. One of the causes for this is the proliferation of geographically-distributed stream data sources such as sensor networks, scientific instruments, pervasive computing environments and web feeds connected to the Internet. Potentially millions of users world-wide want to take advantage of the availability of this data. Therefore they require a convenient way to process real-time stream data at a global scale through applications that perform Internet-scale stream processing (ISSP). Analogous to how search engines make static web data useful to users, an ISSP system reliably collects, filters and processes stream data from potentially thousands of data sources on behalf of many users.The research focus of this proposal is to address a major challenge facing all ISSP applications: achieving robustness in the presence of failures. Failures are a fact of life in such systems because of their scale, heterogeneity and reliance on best-effort Internet infrastructure. This conflicts with the requirements of many users that demand a dependable ISSP (DISSP) service --- a service that should continue functioning even during Internet path outages, processing host failures and resource shortages. For example, consider a scientific study that detects and analyses transient events in the sky by correlating high-bandwidth, real-time image streams from dozens of radio telescopes world-wide. After the failure of the network connection to one of the telescopes, a scientist would expect the system to continue operating, perhaps with reduced resolution. Similarly, the failure of a host that processes the sky images should not cause a service interruption, although it may lead to a decrease in detection confidence of anomalies. Unfortunately, mechanisms for building DISSP systems are lacking. Conventional techniques for reliable data processing cannot be applied to a DISSP system because of its requirements of global scalability, of short recovery times in a real-time setting and of resource efficiency due to a shared infrastructure.This proposal addresses these challenges so that ISSP systems needed for interconnecting tomorrow's pervasive sensor systems and global scientific experiments can become a reality. We intend to develop new techniques for building dependable ISSP systems, taking their unique features in terms of scale, failure model and data quality into account. In particular, we will devise approaches that gracefully degrade result quality in response to resource shortage after failure. While result quality is reduced, the system will provide constant feedback to users on the achieved level of service. Feedback will be expressed in a domain-specific way, e.g., by notifying a scientific user about the reduction in detection confidence of events of interest. This feedback will also drive an adaptive fault-tolerance mechanism, allowing the DISSP system to strategise about resource allocation in order to minimise the reduction in service quality of a maximum number of users.

Publications

10 25 50
 
Description The focus of the DISSP project is to investigate new scalable and reliable software platforms for processing large amounts of streaming data in real-time. The research work resulted in the development of two architectures: (a) the DISSP system can process data in resource-constrained environments, guaranteeing the fairness and quality of processing; and (b) the SEEP system, which is a data-parallel Big Data processing platform that provides a novel stateful dataflow model, which is more expressive than existing Big Data processing models.
Exploitation Route Both the DISSP and SEEP systems have been released as open-source software. The SEEP data processing platform is available on the public GitHub repository and has already gained a number of external users, including IBM and several start-ups. The developed stream processing technology has applications in any domain that has to analyse Big Data as part of its operation.
Sectors Digital/Communication/Information Technologies (including Software),Financial Services, and Management Consultancy,Retail,Security and Diplomacy,Transport

URL http://lsds.doc.ic.ac.uk/projects/DISSP
 
Description The SEEP platform was uses by IBM to create a port of the system to mobile phones with the goal of creating sensing applications on mobile devices. In addition, there was considerable commercial interest in the SEEP platform, and we are exploring the exploitation of SEEP with our industrial partners (in particular BAE Systems). A follow-on project with BP explores applications in the oil and gas domain; a project with Morgan Stanley and VMware investigates applications related to the management of large IT infrastructures. In addition, we have extended SEEP to support machine learning workloads, and received further industrial regarding this.
First Year Of Impact 2012
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Financial Services, and Management Consultancy,Other
Impact Types Economic

 
Description BP Complex Event Processing
Amount £295,000 (GBP)
Organisation BP (British Petroleum) 
Sector Private
Country United Kingdom
Start 11/2013 
End 10/2015
 
Description Custom and Multicore Technologies for Regular Matching Over Streams
Amount £94,621 (GBP)
Funding ID CASE Award 
Organisation BAE Systems 
Sector Academic/University
Country United Kingdom
Start 10/2011 
End 03/2015
 
Description Data Stream Processing in Hybrid Coalition Networks
Amount £128,000 (GBP)
Organisation IBM 
Department IBM UK Ltd
Sector Private
Country United Kingdom
Start 11/2013 
End 04/2015
 
Description HARNESS: Hardware- and Network-Enhanced Software Systems for Cloud Computing
Amount € 750,000 (EUR)
Funding ID 318521 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 10/2012 
End 09/2015
 
Description SEEP: Scalable and Elastic Stream Processing
Amount £94,621 (GBP)
Funding ID CASE Award 
Organisation BAE Systems 
Sector Academic/University
Country United Kingdom
Start 10/2011 
End 10/2014
 
Title SEEP stream processing platform 
Description SEEP is a scalable stream processing platform that can be used for real-time Big Data processing in cloud environments. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The SEEP platform was used for a range of projects by collaborators eg from IBM. 
URL http://github.com/lsds/Seep/