Deep Learning from Crawled Spatio-Temporal Representations of Video (DECSTER)

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
 
Description The work has been focusing on Deep Learning methods for action recognition and action localisation. We have focused in particular on fine-grained recognition and have developed baselines for action localisation as outlined in the original project description.

A key finding underlying all related publications is that fine grained temporal analysis, i.e., at increased temporal resolutions, is important for increased performance. We have developed methods for action recognition and action retrieval that rely mechanisms for feature extraction at high temporal resolution, and mechanisms for temporal alignment for estimating similarity/distances between videos. We have shown that this leads to increased performance in comparison to crude video-based representations. This has been extended for fine-grained (temporal) localisation of actions in long, untrimmed image sequences.

A second key-finding is that, by using a framework called knowledge distillation, in which Networks are used to train each other, it is possible to achieve different tradeoffs of accuracy, speed and storage requirements.

In a parallel direction our work on video summarisation have shown the limitations of the current evaluation protocols and how variations of deep learning methods keep can improving the state-of-the-art.
Exploitation Route We have developed methods for video recognition, action localisation and video summarisation that are published.
We also provide code and datasets that are also in the public domain. Those can be used by others to benchmark their methods, to train their models and to improve on the methods that we have developed.
Sectors Creative Economy,Healthcare,Culture, Heritage, Museums and Collections

 
Description AI4Media
Amount € 12,000,000 (EUR)
Funding ID 951911 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 09/2019 
End 09/2023
 
Title ViSiL code and models 
Description This repository contains the Tensorflow implementation of the paper "ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning", ICCV, 2019, a method for video retrieval. It provides code for the calculation of similarities between the query and database videos given by the user. Also, it contains an evaluation script to reproduce the results of the paper. The video similarity calculation is achieved by applying a frame-to-frame function that respects the spatial within-frame structure of videos and a learned video-to-video similarity function that also considers the temporal structure of videos. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? Yes  
Impact ViSiL has drawn attention since it was made publicly available, with 25 forks (people/groups that started building upon it) and 114 github stars - Also, various researchers contributed with "pull requests", for porting the framework into PyTorch, for instance. 
URL https://github.com/MKLab-ITI/visil
 
Description Collaboration with Institute on Telematics and Informatics 
Organisation Centre for Research and Technology Hellas (CERTH)
Country Greece 
Sector Academic/University 
PI Contribution QMUL has a long standing collaboration with the Institute of Telematics and Informatics, Centre for Research and Technology Hellas (CERTH-ITI). During the period of the Decster project Georgios Kordopatis-Zilos has been performing research on video retrieval under the supervision of Ioannis Patras. The work has been aligned with the aims of the DECSTER project so as to address action recognition, and more specifically action retrieval - the work has resulted in two publications in selective Computer Vision and Multimedia Analysis venues.
Collaborator Contribution Informatics and Telematics Institute (ITI-CERTH, Greece) funded the salaries and paid the fees of the researcher, provides equipment and travel costs and co-supervision of the research.
Impact The collaboration is long standing -- since 2009. Within the Decster project in the period until 01-2020 there have been two publications aligned with the goals of the project.
Start Year 2018
 
Title Few-Shot Action Localization without Knowing Boundaries 
Description The repository contains the implementation of "Few-Shot Action Localization without Knowing Boundaries" (2021 International Conference on Multimedia Retrieval), and provides the training and the evaluation code for reproducing the reported results 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact The code has been very recently released. 
URL https://github.com/June01/WFSAL-icmr21
 
Title Performance over Random -- repository 
Description The repository contains the implementation of "Performance over Random: A Robust Evaluation Protocol for Video Summarization Methods" (28th ACM International Conference on Multimedia (MM '20)) and can be used for evaluating the summaries of a video summarization method using the PoR evaluation protocol. 
Type Of Technology Software 
Year Produced 2020 
Impact The repository contains the implementation of "Performance over Random: A Robust Evaluation Protocol for Video Summarization Methods" (28th ACM International Conference on Multimedia (MM '20)) and can be used for evaluating the summaries of a video summarization method using the PoR evaluation protocol.