📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Multimodal Video Search by Examples (MVSE)

Lead Research Organisation: Queen's University Belfast
Department Name: Sch of Electronics, Elec Eng & Comp Sci

Abstract

How to effectively and efficiently search for content from large video archives such as BBC TV programmes is a significant challenge. Search is typically done via keyword queries using pre-defined metadata such as titles, tags and viewer's notes. However, it is difficult to use keywords to search for specific moments in a video where a particular speaker talks about a specific topic at a particular location. Most videos have little or no metadata about content in the video, and automatic metadata extraction is not yet sufficiently reliable. Furthermore, metadata may change over time and cannot cover all content. Therefore, search by keyword is not a desirable approach for a comprehensive and long-lasting video search solution.

Video search by examples is a desirable alternative as it allows search for content by one or more examples of the interested content without having to specify interest in keyword. However, video search by examples is notoriously challenging, and its performance is still poor. To improve search performance, multiple modalities should be considered - image, sound, voice and text, as each modality provides a separate search cue so multiple cues should identify more relevant content. This is multimodal video search by examples (MVSE). This is an emerging area of research, and the current state of the art is far from desirable so there is a long way to go. There is no commercial service for MVSE.

This proposal has been co-created with BBC R&D through the BBC Data Science Partnership via a number of online meetings and one face to face meeting involving all partners. The proposal has been informed by recent unpublished ethnographic research on how current BBC staff (producers, journalists, archivists) search for media content. It was found that they were very interested in knowledge retrieval from archives or other sources but they required richer metadata and cataloguing of non-verbal data.

In this proposal we will study efficient, effective, scalable and robust MVSE where video archives are large, historical and dynamic; and the modalities are person (face or voice), context, and topic. The aim is to develop a framework for MVSE and validate it through the development of a prototype search tool. Such a search tool will be useful for organisations such as the BBC and British Library, who maintain large collections of video archives and want to provide a search tool for their own staff as well as for the public. It will also be useful for companies such as Youtube who host videos from the public and want to enable video search by examples. We will address key challenges in the development of an efficient, effective, scalable and robust MVSE solution, including video segmentation, content representation, hashing, ranking and fusion.

This proposal is planned for three years, involving three institutions (Cambridge, Surrey, Ulster) and one partner (the BBC) who will contribute significant resources (estimated at £128.4k) to the project (see Letter of Support from the BBC).

Planned Impact

The project's objective is to provide scalable next-generation 'search by example' functionality across national video archives. The project will develop beyond the state of the art in video segmentation, content representation/matching/ranking functionality and these outputs are intended to provide positive, disruptive impact in multimedia search capability across the media industry nationally and internationally.

The beneficiaries of this project's outputs will include academics, journalists, broadcasters, TV viewers, multimedia companies and organisations hosting and managing large video or multimedia repositories.

Journalists and broadcasters will directly benefit by time efficiency savings and the rapid discovery of relevant content when using this new technology. This will in turn provide better, more relevant and more enriched TV programming in less time thus having economic savings. This will have a benefit to TV viewers who will enjoy more relevant TV programmes by the effective repurposing of content within big media archives. As a key partner, the immediate beneficiary will be the BBC who will likely adopt and integrate the new technology within their workflows to improve the discovery of media content when producing TV programmes. However, the technologies developed are transferable to other broadcasters and indeed major online companies such as Youtube who rely on semantically enriched search technologies.

Academics will benefit by the dissemination and inspiration of the project's new research findings and search technologies for rapidly discovering relevant video/multimedia content based on new intelligent algorithms.

The pathways to impact document provides an outline of a series of activities including co-creation workshops and licensing to increase the likelihood of research impact and adoption of the novel, disruptive technologies produced in this project.

Publications

10 25 50

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/V002740/1 31/03/2021 30/07/2021 £720,502
EP/V002740/2 Transfer EP/V002740/1 31/07/2021 30/03/2025 £689,252
 
Description 1. Multimodal video search by examples can be done efficiently and effectively using the latest video segmentation and content embedding techniques.
2. Multimodal video search by examples work on both high quality and low quality video archives.
Exploitation Route The BBC UX team is currently evaluating the research demo with a view for adoption.
Sectors Creative Economy

Leisure Activities

including Sports

Recreation and Tourism

Culture

Heritage

Museums and Collections

 
Description The BBC is currently evaluating the research demo for possible adoption. Some BBC researchers will attend the next project workshop in London on 12 April, and present their findings.
First Year Of Impact 2024
Sector Creative Economy,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections
Impact Types Economic

 
Description Using artificial intelligence to predict and explain conversion to age-related macular degeneration
Amount £125,322 (GBP)
Funding ID ID2022 100028 
Organisation Rosetrees Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 08/2023 
End 08/2026
 
Title Multimodal video search tool 
Description This tool is developed as part of the project. The tool can be used to search a video archive such as BBC Rewind or Youtube. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? No  
Impact This tool could potentially increase productivity of media production. 
 
Description ACM ICMR Workshop on Multimodal Video Retrieval 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The MVSE team organised a workshop associated with ACM ICMR 2024, in Phuket, Thailand in June 2024. The purpose was to disseminate the research findings and to engage with researchers and practitioners in the area of example-based multimedia retrieval. The workshop papers were published an ACM proceedings.
Year(s) Of Engagement Activity 2024
URL https://mvse.ecit.qub.ac.uk/activities/mvrmlm-2024/