Adaptive cognition for automated sports video annotation (ACASVA)

Lead Research Organisation: University of Surrey

Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The development of a machine that can autonomously understand and interpret patterns of real-world events remains a challenging goal in AI. Humans are able to achieve this by developing sophisticated internal representational structures for object and events and the grammars that connect them. ACASVA aims to investigate the interaction between visual and linguistic grammars in learning by developing grammars in a scenario where the number of different events is constrained, by a set of rules, to be small: a sport. We will analyse video footage of a game (e.g. tennis) and use computer vision techniques to progressively understand it as a sequence of (possibly overlapping) events, and build a grammar of events. We will do a similar audio/linguistic analysis on the commentary on the game. Both of these grammars will be used to build a representational structure for understanding the game. Visual representations are additionally constrained by the inference of game rules so that object-classification mechanisms are preferentially tuned to game-relevant entities like 'player' rather than game-irrelevant entities like 'crowd-member'. We will also investigate how the two modes, sight and sound, can influence each other in the learning process; interpretation of the video is affected by the linguistic grammar and vice versa. Furthermore, this coupling of modes will lead to improved recognition of both audio and video events when the grammars from the video modes are used to influence the audio recognition, and vice versa. The psychological component of the ACASVA correspondingly attempts to learn how these capabilities are developed in humans; how visual grammars are organized and employed in the learning problem, how these grammars are modified by prior linguistic knowledge of the domain, how visual grammars map onto linguistic grammars, and how game rule-inferences influence lower-level visual learning (determined via gaze-behaviour). These results will feedback into the machine-learning problem and vice versa, as well as providing a performance benchmark for the system.Potential beneficiaries of ACASVA (in addition to the knowledge beneficiaries within the fields of science and engineering) include the broadcasting and on-line video search industries.

Funded Value:

£1,415,481

Funded Period:

May 09 - Sep 13

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/F069421/1

Principal Investigator:

Josef Kittler

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Cognitive Science Appl. in ICT (50%)

Human Communication in ICT (25%)

Vision & Senses - ICT appl. (25%)

Organisations

University of Surrey (Collaboration, Lead Research Organisation)

People	ORCID iD
Josef Kittler (Principal Investigator)	http://orcid.org/0000-0002-8110-9205
Magda Osman (Co-Investigator)	http://orcid.org/0000-0003-1480-6657
John Groeger (Co-Investigator)	http://orcid.org/0000-0002-3582-1058
David Windridge (Researcher Co-Investigator)	http://orcid.org/0000-0001-5507-8516

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 > >|

10 25 50

Almajai I (2012) Detection and Identification of Rare Audiovisual Cues

Arashloo S (2014) Class-Specific Kernel Fusion of Multiple Descriptors for Face Verification Using Multiscale Binarised Statistical Image Features in IEEE Transactions on Information Forensics and Security

Arashloo S (2014) Dynamic Texture Recognition Using Multiscale Binarized Statistical Image Features in IEEE Transactions on Multimedia

Arashloo S (2015) Face Spoofing Detection Based on Multiple Descriptor Fusion Using Multiscale Dynamic Binarized Statistical Image Features in IEEE Transactions on Information Forensics and Security

Arashloo S (2013) Efficient processing of MRFs for unconstrained-pose face recognition

Beveridge J (2015) Report on the FG 2015 Video Person Recognition Evaluation

Chan CH (2013) Multiscale local phase quantization for robust component-based face recognition using kernel fusion of multiple descriptors. in IEEE transactions on pattern analysis and machine intelligence

Coppi D (2014) On detection of novel categories and subcategories of images using incongruence

De Campos T (2012) Images as sets of locally weighted features in Computer Vision and Image Understanding

De Neys W (2011) Biased but in doubt: conflict and decision confidence. in PloS one

Further Funding
Research Databases and Models
Collaboration
Engagement Activities


Description	EPSRC Programme Grant
Amount	£6,104,265 (GBP)
Funding ID	EP/N007743/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2016
End	12/2020


Description	MURI
Amount	£8,000,000 (GBP)
Funding ID	EP/R018456/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2018
End	12/2022


Description	Platform Grant
Amount	£1,539,000 (GBP)
Funding ID	EP/P022529/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	06/2017
End	06/2022


Description	Signal processing for the networked battlespace
Amount	£3,800,000 (GBP)
Funding ID	EP/K014307/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2013
End	03/2018


Title	ACASVA Actions Dataset
Description	Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data ex, Player's action recognition is one of the challenges in the ACASVA project. The goal is to classify each action sample into three classes: Non-Hit, Hit and Serve. Following deCampos et al [3], we used HOG3D descriptors extracted on player bounding boxes. Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset. Each file contains HOG3D data extracted
Type Of Material	Database/Collection of data
Year Produced	2012
Provided To Others?	Yes
Impact	The data set was used by peer groups in evaluation studies
URL	http://www.cvssp.org/acasva/


Description	MILES
Organisation	University of Surrey
Country	United Kingdom
Sector	Academic/University
PI Contribution	Internal inter-department collaboration was initiated with Department of Computing and School of Psychology, and a small feasibility study fund was awarded by the MILES (Models and Mathematics in Life and Social Sciences) project (12/2012-12/2013).
Start Year	2011


Description	ACASVA Webpage
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	http://cvssp.org/acasva/ Further enquiries about the research done
Year(s) Of Engagement Activity	2009,2010,2011,2012,2013,2014

Abstract

Organisations

People

ORCID iD

Publications