Adaptive cognition for automated sports video annotation (ACASVA)

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The development of a machine that can autonomously understand and interpret patterns of real-world events remains a challenging goal in AI. Humans are able to achieve this by developing sophisticated internal representational structures for object and events and the grammars that connect them. ACASVA aims to investigate the interaction between visual and linguistic grammars in learning by developing grammars in a scenario where the number of different events is constrained, by a set of rules, to be small: a sport. We will analyse video footage of a game (e.g. tennis) and use computer vision techniques to progressively understand it as a sequence of (possibly overlapping) events, and build a grammar of events. We will do a similar audio/linguistic analysis on the commentary on the game. Both of these grammars will be used to build a representational structure for understanding the game. Visual representations are additionally constrained by the inference of game rules so that object-classification mechanisms are preferentially tuned to game-relevant entities like 'player' rather than game-irrelevant entities like 'crowd-member'. We will also investigate how the two modes, sight and sound, can influence each other in the learning process; interpretation of the video is affected by the linguistic grammar and vice versa. Furthermore, this coupling of modes will lead to improved recognition of both audio and video events when the grammars from the video modes are used to influence the audio recognition, and vice versa. The psychological component of the ACASVA correspondingly attempts to learn how these capabilities are developed in humans; how visual grammars are organized and employed in the learning problem, how these grammars are modified by prior linguistic knowledge of the domain, how visual grammars map onto linguistic grammars, and how game rule-inferences influence lower-level visual learning (determined via gaze-behaviour). These results will feedback into the machine-learning problem and vice versa, as well as providing a performance benchmark for the system.Potential beneficiaries of ACASVA (in addition to the knowledge beneficiaries within the fields of science and engineering) include the broadcasting and on-line video search industries.

Publications

10 25 50

publication icon
Arashloo S (2014) Dynamic Texture Recognition Using Multiscale Binarized Statistical Image Features in IEEE Transactions on Multimedia

publication icon
Chan CH (2013) Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors. in IEEE transactions on pattern analysis and machine intelligence

publication icon
De Campos T (2012) Images as sets of locally weighted features in Computer Vision and Image Understanding

publication icon
Feng Z (2015) Random Cascaded-Regression Copse for Robust Facial Landmark Detection in IEEE Signal Processing Letters

publication icon
Feng ZH (2015) Cascaded Collaborative Regression for Robust Facial Landmark Detection Trained Using a Mixture of Synthetic and Real Images With Dynamic Weighting. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

publication icon
Huang Q (2011) Inferring the Structure of a Tennis Game Using Audio Information in IEEE Transactions on Audio, Speech, and Language Processing

publication icon
Kittler J (2014) Domain Anomaly Detection in Machine Perception: A System Architecture and Taxonomy. in IEEE transactions on pattern analysis and machine intelligence

publication icon
MENDEZ-VÁZQUEZ H (2013) PHOTOMETRIC NORMALIZATION FOR FACE RECOGNITION USING LOCAL DISCRETE COSINE TRANSFORM in International Journal of Pattern Recognition and Artificial Intelligence

publication icon
Osman M (2012) The role of reward in dynamic decision making. in Frontiers in neuroscience

publication icon
Osman M (2012) Prediction and control in a dynamic environment. in Frontiers in psychology

publication icon
Poh N (2010) Addressing Missing Values in Kernel-Based Multimodal Biometric Fusion Using Neutral Point Substitution in IEEE Transactions on Information Forensics and Security

publication icon
Sidiropoulos P (2012) Differential Edit Distance: A Metric for Scene Segmentation Evaluation in IEEE Transactions on Circuits and Systems for Video Technology

publication icon
Sánchez J (2012) Modeling the spatial layout of images beyond spatial pyramids in Pattern Recognition Letters

publication icon
Taya S (2010) Cast shadow can modulate the judged final position of a moving target. in Attention, perception & psychophysics

publication icon
Windridge D (2013) A Framework for Hierarchical Perception-Action Learning Utilizing Fuzzy Reasoning. in IEEE transactions on cybernetics

publication icon
Windridge D (2013) Characterizing Driver Intention via Hierarchical Perception-Action Modeling in IEEE Transactions on Human-Machine Systems

 
Description EPSRC Programme Grant
Amount £6,104,265 (GBP)
Funding ID EP/N007743/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2016 
End 12/2020
 
Description MURI
Amount £8,000,000 (GBP)
Funding ID EP/R018456/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2018 
End 12/2022
 
Description Platform Grant
Amount £1,539,000 (GBP)
Funding ID EP/P022529/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2017 
End 06/2022
 
Description Signal processing for the networked battlespace
Amount £3,800,000 (GBP)
Funding ID EP/K014307/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2013 
End 03/2018
 
Title ACASVA Actions Dataset 
Description Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data ex, Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data extracted 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The data set was used by peer groups in evaluation studies 
URL http://www.cvssp.org/acasva/
 
Description MILES 
Organisation University of Surrey
Country United Kingdom 
Sector Academic/University 
PI Contribution Internal inter-department collaboration was initiated with Department of Computing and School of Psychology, and a small feasibility study fund was awarded by the MILES (Models and Mathematics in Life and Social Sciences) project (12/2012-12/2013).
Start Year 2011
 
Description ACASVA Webpage 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact http://cvssp.org/acasva/

Further enquiries about the research done
Year(s) Of Engagement Activity 2009,2010,2011,2012,2013,2014