Using Video Information Retrieval (VIR) for "decoding" video-based content
Lead Research Organisation:
Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science
Abstract
Media broadcasting has been in very high demand during the last decade or so. Even before the pandemic, people used to spend quite a lot of time watching their favourite programs. TV broadcasters and VideoOnDemand (VoD) Platforms have increased their content dramatically, in order to meet viewers' expectations.
At Channel4, one of our strategic pillars is to have the best possible understanding of our customers' behaviour. Viewers' expectations have changed dramatically, and we need to put viewers' motivations at the heart of our decision-making (Future4, 2021). To better understand who our customers are, we need to focus on their interactions with our content. As a result, a deep understanding of our content is a prerequisite to decoding viewers' preferences.
Music Information Retrieval (MIR) has been researched extensively in the past. Music has facets that "are not just to be found within the music itself, in the form of melody, harmony, tempo, and timbre, but are interpreted by the cognitive processes of the listener within frameworks of culturally agreed rules, such as genre, style, and mood" (Inksip, 2011). On the other hand, there is very limited research around retrieving information from video-based content. This research proposal focuses on a comparable Content-based Information Retrieval principle, however, the content will not be music, but video.
The whole broadcasting community can benefit from this work. VoD Platforms and TV Broadcasters should let their viewers play a key role within their decision-making systems, and not only to receive recommendations and communications based on demographics. Viewers' interactions with video-based content can become the richest source of information, if Machine Learning techniques retrieve insights, by checking the enormous amounts of data and metadata that are generated by the millions of viewings on a daily basis. Lee et al (2016) highlighted some potential improvements for a model's accuracy, in order to predict the success of a movie. Taking perspectives such as introducing "unexplored features", can elevate the accuracy of the model (Lee et al, 2016).
For this project, I will focus on three main roadmaps. Firstly, I am proposing a segmentation of video-based content on micro-genres. Besides the fact that the very broad genres are still useful, Channel4 needs to follow an approach similar to Netflix, which has come up with over 76,000 micro-genres (Janelle K, 2020). I consider it a very important task of every Broadcaster, either linear or OnDemand, to segment their content efficiently, in order to feed their Machine Learning models with features that will help them improve their accuracy. I am proposing the use of Computer Vision and cutting-edge Machine Learning in order to define micro-genres that describe Channel4's video-based content.
Secondly, Information Retrieval can be achieved by applying Machine Learning to the scripts and the dialogues of the content, and the video itself. I am proposing the use of Natural Language Processing and Computer Vision, in order to extract the real feelings that a video is passing to its viewers. Sentiment analysis of scripts, dialogues, and videos can be a very useful source of information, and might reveal some extra aspects for the content that will boost a Machine Learning model with extra power.
Thirdly, I need to identify the different viewing modes that users watch for video-based content. By applying cutting-edge Machine Learning algorithms, I am suggesting a segmentation of the viewings whether they are broadcasted during a commute, or after work, or maybe during a lunch break. The challenge is to manage and pre-process all the metadata that users leave after their viewings. Advanced coding should be applied in order to transform all this amount of unstructured data, into dimensions that will help a Machine Learning model generate valuable insights.
At Channel4, one of our strategic pillars is to have the best possible understanding of our customers' behaviour. Viewers' expectations have changed dramatically, and we need to put viewers' motivations at the heart of our decision-making (Future4, 2021). To better understand who our customers are, we need to focus on their interactions with our content. As a result, a deep understanding of our content is a prerequisite to decoding viewers' preferences.
Music Information Retrieval (MIR) has been researched extensively in the past. Music has facets that "are not just to be found within the music itself, in the form of melody, harmony, tempo, and timbre, but are interpreted by the cognitive processes of the listener within frameworks of culturally agreed rules, such as genre, style, and mood" (Inksip, 2011). On the other hand, there is very limited research around retrieving information from video-based content. This research proposal focuses on a comparable Content-based Information Retrieval principle, however, the content will not be music, but video.
The whole broadcasting community can benefit from this work. VoD Platforms and TV Broadcasters should let their viewers play a key role within their decision-making systems, and not only to receive recommendations and communications based on demographics. Viewers' interactions with video-based content can become the richest source of information, if Machine Learning techniques retrieve insights, by checking the enormous amounts of data and metadata that are generated by the millions of viewings on a daily basis. Lee et al (2016) highlighted some potential improvements for a model's accuracy, in order to predict the success of a movie. Taking perspectives such as introducing "unexplored features", can elevate the accuracy of the model (Lee et al, 2016).
For this project, I will focus on three main roadmaps. Firstly, I am proposing a segmentation of video-based content on micro-genres. Besides the fact that the very broad genres are still useful, Channel4 needs to follow an approach similar to Netflix, which has come up with over 76,000 micro-genres (Janelle K, 2020). I consider it a very important task of every Broadcaster, either linear or OnDemand, to segment their content efficiently, in order to feed their Machine Learning models with features that will help them improve their accuracy. I am proposing the use of Computer Vision and cutting-edge Machine Learning in order to define micro-genres that describe Channel4's video-based content.
Secondly, Information Retrieval can be achieved by applying Machine Learning to the scripts and the dialogues of the content, and the video itself. I am proposing the use of Natural Language Processing and Computer Vision, in order to extract the real feelings that a video is passing to its viewers. Sentiment analysis of scripts, dialogues, and videos can be a very useful source of information, and might reveal some extra aspects for the content that will boost a Machine Learning model with extra power.
Thirdly, I need to identify the different viewing modes that users watch for video-based content. By applying cutting-edge Machine Learning algorithms, I am suggesting a segmentation of the viewings whether they are broadcasted during a commute, or after work, or maybe during a lunch break. The challenge is to manage and pre-process all the metadata that users leave after their viewings. Advanced coding should be applied in order to transform all this amount of unstructured data, into dimensions that will help a Machine Learning model generate valuable insights.
People |
ORCID iD |
Ioannis Patras (Primary Supervisor) | |
Dimitrios Gousis (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/V519935/1 | 30/09/2020 | 29/04/2028 | |||
2604208 | Studentship | EP/V519935/1 | 30/09/2021 | 29/09/2025 | Dimitrios Gousis |