Automated visual understanding of video content

Lead Research Organisation: University of Oxford
Department Name: Engineering Science


The aim of the research is to learn to recognise categories and attributes using multiple sources of supervision, from both videos and text. Immediate goals include automated character detection from video data in collaboration with the BBC, via the development of both robust supervised and unsupervised machine learning techniques. Such techniques will be used to perform facial tracking, localisation and recognition from challenging video datasets.

Deep learning models will be developed to take advantage of this disparate training, with the ultimate long term aim of developing an understanding of stories and context from both visual and textual content.
This involves creating systems capable of understanding high level semantics, (i.e. the ability to answer questions from video content alone such as 'how' and 'why'), which besides allowing a deep contextual understanding will also allow intelligent fast forwards in movies and television shows. Applications of such research involve assistive solutions for the visually impaired, as well as cognitive robotics, which require a holistic understanding of the visual world through the ability to learn context, motivation, intent and emotion.

This project falls within the EPSRC research area Information and Communication Technologies (ICT), and will contribute to research in the themes of Artificial Intelligence and Image and Vision Computing. It is in collaboration with the BBC.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/P510609/1 01/10/2016 30/09/2021
1801261 Studentship EP/P510609/1 01/10/2016 30/09/2020 Arsha Nagrani