Creating a Powerful Navigation Tool for image and video datasets

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

Understanding the scene in an image; although a trivial task for humans, and one that children master by a young age, object recognition and the extrapolation to scene understanding remain the core problems that computer vision researchers are trying to solve. In recent years, the use of convolutional neural networks (a complex and powerful type of computer vision algorithm) and machine learning has resulted in computers rivalling humans at certain vision tasks.
The ability of a robot to understand who people are and what they are doing in images and video has many potential applications. For example, say you wanted to find the scene in the film where two particular characters are shouting at each other, or where one character is laughing; if a computer could understand human actions and interactions in a scene then a powerful video navigation tool could be made. Amongst many other potential applications is that of smart-glasses for those who suffer from autism, which could label human expressions and emotions for the wearer to help them better understand their surroundings. The aim of my research is to use convolutional neural networks and machine learning methods to create a powerful navigation tool for image and video datasets, by improving the ability for computers to understand who people are, and what they are doing in a scene.
The key objectives of my research will be to successfully train and implement computer vision algorithms for recognising identities using facial recognition, for recognising human pose and actions, and also for recognising human emotions and interactions, culminating in a powerful navigation tool for image and video datasets that can understand complex instructions from a human user with regards to a particular scene that they want to find. The project aims to answer the question of: Can computers understand a scene of human action and interaction as well as a human can?
The novel engineering methodology in this project will be two-fold: Firstly, some of these objectives have either never been tackled by computer vision researchers or have hardly been tackled, such as teaching computers to recognise interactions between multiple humans. Therefore the work will involve curating novel datasets (that will be made freely available to the international research community) to train algorithms with and pioneering the first benchmark results. Secondly I will be improving upon current standards for more popular tasks such as facial recognition, and so this will involve the research of novel neural network architectures and machine learning techniques in order to perform these tasks better.
This project falls within the EPSRC 'engineering' research area. No companies or collaborators are involved.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
2118163 Studentship EP/N509711/1 01/10/2018 31/03/2022 Andrew Brown