Adaptive cognition for automated sports video annotation (ACASVA)

Lead Research Organisation: University of Surrey

Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The development of a machine that can autonomously understand and interpret patterns of real-world events remains a challenging goal in AI. Humans are able to achieve this by developing sophisticated internal representational structures for object and events and the grammars that connect them. ACASVA aims to investigate the interaction between visual and linguistic grammars in learning by developing grammars in a scenario where the number of different events is constrained, by a set of rules, to be small: a sport. We will analyse video footage of a game (e.g. tennis) and use computer vision techniques to progressively understand it as a sequence of (possibly overlapping) events, and build a grammar of events. We will do a similar audio/linguistic analysis on the commentary on the game. Both of these grammars will be used to build a representational structure for understanding the game. Visual representations are additionally constrained by the inference of game rules so that object-classification mechanisms are preferentially tuned to game-relevant entities like 'player' rather than game-irrelevant entities like 'crowd-member'. We will also investigate how the two modes, sight and sound, can influence each other in the learning process; interpretation of the video is affected by the linguistic grammar and vice versa. Furthermore, this coupling of modes will lead to improved recognition of both audio and video events when the grammars from the video modes are used to influence the audio recognition, and vice versa. The psychological component of the ACASVA correspondingly attempts to learn how these capabilities are developed in humans; how visual grammars are organized and employed in the learning problem, how these grammars are modified by prior linguistic knowledge of the domain, how visual grammars map onto linguistic grammars, and how game rule-inferences influence lower-level visual learning (determined via gaze-behaviour). These results will feedback into the machine-learning problem and vice versa, as well as providing a performance benchmark for the system.Potential beneficiaries of ACASVA (in addition to the knowledge beneficiaries within the fields of science and engineering) include the broadcasting and on-line video search industries.

Funded Value:

£1,415,481

Funded Period:

May 09 - Sep 13

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/F069421/1

Principal Investigator:

Josef Kittler

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Cognitive Science Appl. in ICT (50%)

Human Communication in ICT (25%)

Vision & Senses - ICT appl. (25%)

Organisations

University of Surrey (Collaboration, Lead Research Organisation)

People	ORCID iD
Josef Kittler (Principal Investigator)	http://orcid.org/0000-0002-8110-9205
Magda Osman (Co-Investigator)	http://orcid.org/0000-0003-1480-6657
John Groeger (Co-Investigator)
David Windridge (Researcher Co-Investigator)	http://orcid.org/0000-0001-5507-8516

Publications

Author Name

Title Publication Date Published

10 25 50

Almajai I (2012) Detection and Identification of Rare Audiovisual Cues

Arashloo S (2014) Class-Specific Kernel Fusion of Multiple Descriptors for Face Verification Using Multiscale Binarised Statistical Image Features in IEEE Transactions on Information Forensics and Security

Arashloo S (2015) Face Spoofing Detection Based on Multiple Descriptor Fusion Using Multiscale Dynamic Binarized Statistical Image Features in IEEE Transactions on Information Forensics and Security

Arashloo S (2014) Dynamic Texture Recognition Using Multiscale Binarized Statistical Image Features in IEEE Transactions on Multimedia

Arashloo S (2013) Efficient processing of MRFs for unconstrained-pose face recognition

Beveridge J (2015) Report on the FG 2015 Video Person Recognition Evaluation

Chan CH (2013) Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors. in IEEE transactions on pattern analysis and machine intelligence

Coppi D (2014) On detection of novel categories and subcategories of images using incongruence

De Campos T (2012) Images as sets of locally weighted features in Computer Vision and Image Understanding

De Neys W (2011) Biased but in doubt: conflict and decision confidence. in PloS one

Farajidavar N (2015) Computer Vision -- ACCV 2014 - 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part III

Feng Z (2015) Random Cascaded-Regression Copse for Robust Facial Landmark Detection in IEEE Signal Processing Letters

Feng Z (2013) Multiple Classifier Systems

Feng Z (2017) Face Detection, Bounding Box Aggregation and Pose Estimation for Robust Facial Landmark Localisation in the Wild

Feng ZH (2015) Cascaded Collaborative Regression for Robust Facial Landmark Detection Trained Using a Mixture of Synthetic and Real Images With Dynamic Weighting. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Hu G (2014) Robust face recognition by an albedo based 3D morphable model

Hu G (2013) A facial symmetry prior for improved illumination fitting of 3D morphable model

Hu G (2015) When Face Recognition Meets with Deep Learning: An Evaluation of Convolutional Neural Networks for Face Recognition

Huang Q (2011) Inferring the Structure of a Tennis Game Using Audio Information in IEEE Transactions on Audio, Speech, and Language Processing

Huber P (2015) Fitting 3D Morphable Face Models using local features

Iwaya LH (2023) On the privacy of mental health apps: An empirical investigation and its implications for app development. in Empirical software engineering

Khan A (2014) Multilevel Chinese takeaway process and label-based processes for rule induction in the context of automated sports video annotation. in IEEE transactions on cybernetics

Kittler J (2014) Domain Anomaly Detection in Machine Perception: A System Architecture and Taxonomy. in IEEE transactions on pattern analysis and machine intelligence

MENDEZ-VÁZQUEZ H (2013) PHOTOMETRIC NORMALIZATION FOR FACE RECOGNITION USING LOCAL DISCRETE COSINE TRANSFORM in International Journal of Pattern Recognition and Artificial Intelligence

Osman M (2012) The role of reward in dynamic decision making. in Frontiers in neuroscience

Osman M (2011) Cue utilization and strategy application in stable and unstable dynamic environments in Cognitive Systems Research

Osman M (2010) Controlling uncertainty: a review of human behavior in complex dynamic environments. in Psychological bulletin

Osman M (2012) Prediction and control in a dynamic environment. in Frontiers in psychology

Osman M (2010) Controlling uncertainty: A review of human behavior in complex dynamic environments. in Psychological Bulletin

Poh N (2010) Addressing Missing Values in Kernel-Based Multimodal Biometric Fusion Using Neutral Point Substitution in IEEE Transactions on Information Forensics and Security

Poschmann P (2014) Fusion of Tracking Techniques to Enhance Adaptive Real-time Tracking of Arbitrary Objects in Procedia Computer Science

Qiang Huang (Author) (2012) Detection of Ball Hits in a Tennis Game Using Audio and Visual Information

Rahimzadeh Arashloo S (2014) Fast pose invariant face recognition using super coupled multiresolution Markov Random Fields on a GPU in Pattern Recognition Letters

Sidiropoulos P (2012) Differential Edit Distance: A Metric for Scene Segmentation Evaluation in IEEE Transactions on Circuits and Systems for Video Technology

Sánchez J (2012) Modeling the spatial layout of images beyond spatial pyramids in Pattern Recognition Letters

Tahir M (2012) Multilabel classification using heterogeneous ensemble of multi-label classifiers in Pattern Recognition Letters

Tahir M (2016) Multi-label classification using stacked spectral kernel discriminant analysis in Neurocomputing

Tahir M (2013) A Robust and Scalable Visual Category and Action Recognition System Using Kernel Discriminant Analysis With Spectral Regression in IEEE Transactions on Multimedia

Taya S (2012) Looking to score: the dissociation of goal influence on eye movement and meta-attentional allocation in a complex dynamic natural scene. in PloS one

Taya S (2013) Trained eyes: experience promotes adaptive gaze control in dynamic and uncertain visual environments. in PloS one

Taya S (2010) Cast shadow can modulate the judged final position of a moving target. in Attention, perception & psychophysics

Windridge D (2015) A Novel Markov Logic Rule Induction Strategy for Characterizing Sports Video Footage in IEEE MultiMedia

Windridge D (2013) A Framework for Hierarchical Perception-Action Learning Utilizing Fuzzy Reasoning. in IEEE transactions on cybernetics

Windridge D (2013) Characterizing Driver Intention via Hierarchical Perception-Action Modeling in IEEE Transactions on Human-Machine Systems

Yan F (2009) Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis

Yan F (2014) Automatic annotation of tennis games: An integration of audio, vision, and learning in Image and Vision Computing

Yan F (2011) Multiple Classifier Systems

Yan F (2010) Multiple kernel learning and feature space denoising

Zhou X (2013) A two layered data association approach for ball tracking

Further Funding
Research Databases and Models
Collaboration
Engagement Activities


Description	EPSRC Programme Grant
Amount	£6,104,265 (GBP)
Funding ID	EP/N007743/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2016
End	12/2020


Description	MURI
Amount	£8,000,000 (GBP)
Funding ID	EP/R018456/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2018
End	12/2022


Description	Platform Grant
Amount	£1,539,000 (GBP)
Funding ID	EP/P022529/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	07/2017
End	06/2022


Description	Signal processing for the networked battlespace
Amount	£3,800,000 (GBP)
Funding ID	EP/K014307/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	04/2013
End	03/2018


Title	ACASVA Actions Dataset
Description	Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data ex, Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data extracted
Type Of Material	Database/Collection of data
Year Produced	2012
Provided To Others?	Yes
Impact	The data set was used by peer groups in evaluation studies
URL	http://www.cvssp.org/acasva/


Description	MILES
Organisation	University of Surrey
Country	United Kingdom
Sector	Academic/University
PI Contribution	Internal inter-department collaboration was initiated with Department of Computing and School of Psychology, and a small feasibility study fund was awarded by the MILES (Models and Mathematics in Life and Social Sciences) project (12/2012-12/2013).
Start Year	2011


Description	ACASVA Webpage
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	http://cvssp.org/acasva/ Further enquiries about the research done
Year(s) Of Engagement Activity	2009,2010,2011,2012,2013,2014

Abstract

Organisations

People

ORCID iD

Publications