Adaptive cognition for automated sports video annotation (ACASVA)

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The development of a machine that can autonomously understand and interpret patterns of real-world events remains a challenging goal in AI. Humans are able to achieve this by developing sophisticated internal representational structures for object and events and the grammars that connect them. ACASVA aims to investigate the interaction between visual and linguistic grammars in learning by developing grammars in a scenario where the number of different events is constrained, by a set of rules, to be small: a sport. We will analyse video footage of a game (e.g. tennis) and use computer vision techniques to progressively understand it as a sequence of (possibly overlapping) events, and build a grammar of events. We will do a similar audio/linguistic analysis on the commentary on the game. Both of these grammars will be used to build a representational structure for understanding the game. Visual representations are additionally constrained by the inference of game rules so that object-classification mechanisms are preferentially tuned to game-relevant entities like 'player' rather than game-irrelevant entities like 'crowd-member'. We will also investigate how the two modes, sight and sound, can influence each other in the learning process; interpretation of the video is affected by the linguistic grammar and vice versa. Furthermore, this coupling of modes will lead to improved recognition of both audio and video events when the grammars from the video modes are used to influence the audio recognition, and vice versa. The psychological component of the ACASVA correspondingly attempts to learn how these capabilities are developed in humans; how visual grammars are organized and employed in the learning problem, how these grammars are modified by prior linguistic knowledge of the domain, how visual grammars map onto linguistic grammars, and how game rule-inferences influence lower-level visual learning (determined via gaze-behaviour). These results will feedback into the machine-learning problem and vice versa, as well as providing a performance benchmark for the system.Potential beneficiaries of ACASVA (in addition to the knowledge beneficiaries within the fields of science and engineering) include the broadcasting and on-line video search industries.

Publications

10 25 50

publication icon
Arashloo S (2014) Dynamic Texture Recognition Using Multiscale Binarized Statistical Image Features in IEEE Transactions on Multimedia

publication icon
Chan CH (2013) Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors. in IEEE transactions on pattern analysis and machine intelligence

publication icon
De Campos T (2012) Images as sets of locally weighted features in Computer Vision and Image Understanding

publication icon
FarajiDavar N (2014) Transductive Transfer Machine in Proceedings ACCV 2014

publication icon
FarajiDavar N (2014) Adaptive Transductive Transfer Machine in Proceedings British Machine Vision Conference

 
Description EPSRC Programme Grant
Amount £6,104,265 (GBP)
Funding ID EP/N007743/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2016 
End 12/2020
 
Description MURI
Amount £8,000,000 (GBP)
Funding ID EP/R018456/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2018 
End 12/2022
 
Description Platform Grant
Amount £1,539,000 (GBP)
Funding ID EP/P022529/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2017 
End 06/2022
 
Description Signal processing for the networked battlespace
Amount £3,800,000 (GBP)
Funding ID EP/K014307/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2013 
End 03/2018
 
Title ACASVA Actions Dataset 
Description Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data ex, Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data extracted 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The data set was used by peer groups in evaluation studies 
URL http://www.cvssp.org/acasva/
 
Description MILES 
Organisation University of Surrey
Country United Kingdom 
Sector Academic/University 
PI Contribution Internal inter-department collaboration was initiated with Department of Computing and School of Psychology, and a small feasibility study fund was awarded by the MILES (Models and Mathematics in Life and Social Sciences) project (12/2012-12/2013).
Start Year 2011
 
Description ACASVA Webpage 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact http://cvssp.org/acasva/

Further enquiries about the research done
Year(s) Of Engagement Activity 2009,2010,2011,2012,2013,2014