Learning to Recognise Dynamic Visual Content from Broadcast Footage

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
 
Description There are two key developments:

1. A method for predicting point sets in images. For example, to predict a person's 2D pose by localizing the points of their hands, elbows and shoulders; or to track eyes, nose and mouth on a video of a moving face. The method is based on deep learning of a convolutional neural network model.
Software has been made publically available

2. A method for learning to recognize human gestures, such as sign language, in videos. The method only requires a single example of the gesture to learn from, and then improves its recognition performance by finding other examples in video. The approach involves tracking human hands in video, and then detecting the gesture using machine learning techniques.
Exploitation Route Can be used in any application that requires tracking human pose.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://www.robots.ox.ac.uk/~vgg/research/sign_language_new/
 
Title Human Pose Estimation datasets 
Description A set of large video datasets annotated with human upper-body pose 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact Several papers have used this as a benchmark. 
URL http://www.robots.ox.ac.uk/~vgg/data/pose/
 
Title Software for Detecting Upper Body Configurations 
Description Software to accurately and efficiently detect configurations of one or more people in edited TV material. Such configurations often appear in standard arrangements due to cinematic style, and we take advantage of this to provide scene context. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Available to be used. 
URL http://www.robots.ox.ac.uk/~vgg/software/ubc/
 
Title Software for Personalizing Human Video Pose Estimation 
Description Convolutional networks (ConvNets) currently produce the state-of-the-art results for the task of human pose estimation. However, even ConvNets can still produce absurdly erroneous pose predictions in videos - particularly for unusual poses, challenging illumination or viewing conditions, self-occlusions or unusual shapes (e.g. when wearing baggy clothing, or unusual body proportions). We address these issues with a method for automatically learning reliable, occlusion-aware, person specific pose estimators in videos. Using the fact that people tend not to change appearance over the course of a long video (same clothes, same body shape), we show that the large quantity of data in the video can be exploited to 'personalize' a ConvNet pose estimator, thereby improving performance for unusual poses. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Too early 
URL http://www.robots.ox.ac.uk/~vgg/software/personalized_pose/
 
Title VGG CNN Heatmap Regressor 
Description This code enables training of heatmap regressor ConvNets for the general problem of regressing (x,y) positions in images. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact Has already been used in several publications. 
URL http://www.robots.ox.ac.uk/~vgg/software/cnn_heatmap/