Deep Learning from Crawled Spatio-Temporal Representations of Video (DECSTER)

Lead Research Organisation: University College London
Department Name: Electronic and Electrical Engineering

Abstract

Video has been one of the most pervasive forms of online media for some time. Several statistics show that video traffic will dominate IP networks within the next five years. Yet, video remains one of the least-manageable elements of the big data ecosystem. This project argues that this difficulty stems primarily from the fact that all advanced computer vision and machine learning algorithms view video as a stream of frames of picture elements. This is despite the fact that pixel-domain representations are known to be notoriously difficult to manage in machine learning systems, mainly due to: their high volume, high redundancy between successive frames, and artifacts stemming from camera calibration under varying illumination.

We propose to abandon pixel representations and consider spatio-temporal activity information that is directly extractable from compressed video bitstreams or neuromorphic vision sensing (NVS) hardware. The first key outcome of the project will be to design deep neural networks (DNNs) that ingest such activity information in order to derive state-of-the-art classification, action recognition and retrieval results within large video datasets. This will be achieved at record-breaking speed and comparable accuracy to the best DNN designs that utilize pixel-domain video representations and/or optical flow calculations. The second key outcome will be to design and prototype a crawler-based bitstream parsing and analysis service, where some of the parsing and processing will be carried out by a bitstream crawler running on a remote repository, while the back-end processing will be carried out by high-performance servers in the cloud. This will enable for the first time the continuous parsing of large compressed video content libraries and NVS repositories with new & improved versions of crawlers in order to derive continuously-improved semantics or track changes and new content elements, in a manner similar to how search engine bots continuously crawl web content. These outcomes will pave the way for exabyte-scale video datasets to be newly-discovered and analysed over commodity hardware.

Planned Impact

Industrial stakeholders and the general public will benefit from the results of this research via the development of advanced video classification and retrieval services at scale and resource levels that are impossible to achieve with conventional pixel-based video analysis systems. Therefore, the project outcomes may enable a wide range of new and emerging consumer video and Internet-of-Things (IoT) related applications, thus helping to meet public expectations for the future of advanced visual computing systems. The role of industry and in particular our industrial partners, will be of paramount importance here, especially in view of the significance of the widespread adoption of new media processing technologies in numerous vertical sectors such as advertising, surveillance, recommendation services, etc. The dissemination of our research outputs to standardisation bodies, such as the on-going work of ISO/IEC MPEG on CDVA and ISO ISAN extensions, will facilitate this impact.

Our industrial partners are in a leading position to exploit the research outcomes within their products and services (e.g., Soundmouse for the creative industries sector and Yamaha Motor for smart vehicles) and the planned interactions with them will substantially facilitate this. Overall, as detailed in the Impact document, this encompasses three large areas: creative content production and management systems, cloud computing services for media processing, and IoT-oriented vehicle and surveillance systems.

Publications

10 25 50
 
Description In our first two publications, we have showed that, under minimal loss of accuracy against the state-of-the-art in video classification, action/object localisation & recognition, and video retrieval, up to 100-fold increase of processing throughput is achieved. In addition, we can reduce the data transfer requirements to as low as 3 kilobits per second.
Exploitation Route Our industrial partners are in a leading position to exploit the research outcomes within their products and services (e.g., Soundmouse for the creative industries sector and Yamaha Motor for smart vehicles) and the planned interactions with them will substantially facilitate this. Overall, as detailed in the Impact document, this encompasses three large areas:
creative content production and management systems, cloud computing services for media processing, and IoT-oriented vehicle and surveillance systems.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software)

URL https://github.com/rate-accuracy-mvcnn/main
 
Description open-source software being released (PIX2NVS and MV-CNN projects on github), now in use for industrial and academic R&D
First Year Of Impact 2019
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic