Learning to Recognise Dynamic Visual Content from Broadcast Footage

Lead Research Organisation: University of Oxford

Department Name: Engineering Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Funded Value:

£500,842

Funded Period:

Oct 11 - Mar 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/I012001/1

Principal Investigator:

Andrew Zisserman

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (100%)

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Andrew Zisserman (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Charles J (2013) Automatic and Efficient Human Pose Estimation for Sign Language Videos in International Journal of Computer Vision

Charles J. (2013) Domain Adaptation for Upper Body Pose Tracking in Signed TV Broadcasts in N/A

Charles, J. (2014) Upper Body Pose Estimation with Temporal Sequential Forests in N/A

Chung, J.S. (2016) Signs in time: Encoding human motion as a temporal image

Hoai M (2014) Talking Heads: Detecting Humans and Recognizing Their Interactions

Hoai M (2013) Discriminative Sub-categorization

Hoai, M (2014) Action Recognition From Weak Alignment of Body Parts in British Machine Vision Conference, 2014

J. Charles (2016) Personalizing Human Video Pose Estimation

Liotti E (2018) Crystal nucleation in metallic alloys using x-ray radiography and machine learning. in Science advances

Minh Hoai (2014) Thread-Safe: Towards Recognizing Human Actions Across Shot Boundaries

Key Findings
Research Databases and Models
Software and Technical Products


Description	There are two key developments: 1. A method for predicting point sets in images. For example, to predict a person's 2D pose by localizing the points of their hands, elbows and shoulders; or to track eyes, nose and mouth on a video of a moving face. The method is based on deep learning of a convolutional neural network model. Software has been made publically available 2. A method for learning to recognize human gestures, such as sign language, in videos. The method only requires a single example of the gesture to learn from, and then improves its recognition performance by finding other examples in video. The approach involves tracking human hands in video, and then detecting the gesture using machine learning techniques.
Exploitation Route	Can be used in any application that requires tracking human pose.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	http://www.robots.ox.ac.uk/~vgg/research/sign_language_new/


Title	Human Pose Estimation datasets
Description	A set of large video datasets annotated with human upper-body pose
Type Of Material	Database/Collection of data
Year Produced	2015
Provided To Others?	Yes
Impact	Several papers have used this as a benchmark.
URL	http://www.robots.ox.ac.uk/~vgg/data/pose/


Title	Software for Detecting Upper Body Configurations
Description	Software to accurately and efficiently detect configurations of one or more people in edited TV material. Such configurations often appear in standard arrangements due to cinematic style, and we take advantage of this to provide scene context.
Type Of Technology	Software
Year Produced	2014
Open Source License?	Yes
Impact	Available to be used.
URL	http://www.robots.ox.ac.uk/~vgg/software/ubc/


Title	Software for Personalizing Human Video Pose Estimation
Description	Convolutional networks (ConvNets) currently produce the state-of-the-art results for the task of human pose estimation. However, even ConvNets can still produce absurdly erroneous pose predictions in videos - particularly for unusual poses, challenging illumination or viewing conditions, self-occlusions or unusual shapes (e.g. when wearing baggy clothing, or unusual body proportions). We address these issues with a method for automatically learning reliable, occlusion-aware, person specific pose estimators in videos. Using the fact that people tend not to change appearance over the course of a long video (same clothes, same body shape), we show that the large quantity of data in the video can be exploited to 'personalize' a ConvNet pose estimator, thereby improving performance for unusual poses.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	Too early
URL	http://www.robots.ox.ac.uk/~vgg/software/personalized_pose/


Title	VGG CNN Heatmap Regressor
Description	This code enables training of heatmap regressor ConvNets for the general problem of regressing (x,y) positions in images.
Type Of Technology	Software
Year Produced	2015
Open Source License?	Yes
Impact	Has already been used in several publications.
URL	http://www.robots.ox.ac.uk/~vgg/software/cnn_heatmap/

Abstract

Organisations

People

ORCID iD

Publications