Multisource audio-visual production from user-generated content

Lead Research Organisation: Queen Mary University of London

Department Name: Sch of Electronic Eng & Computer Science

Abstract

The pervasiveness of amateur media recorders either embedded in smartphones or as stand-alone devices is revolutionizing the way events are captured and reported. The aim of this project is to devise intelligent editing and production algorithms based on new signal processing techniques for processing multi-view user-generated content.

The explosion of shared video content offers the opportunity for new ways of not only analysing but also timely reporting stories, ranging from disaster scenes and protests to music concerts and sports events. However, the large amount of data increasingly available and their varying quality makes the selection and editing of appropriate multimedia items in a timely manner very difficult thus strongly limiting the opportunity to harvest this data for security, cultural and entertainment applications. There is an urgent need to investigate and develop new ways to help or replace what used to be the role of a producer/director in this rapidly changing landscape. In particular, there is the need to automate production tasks and to generate new and high-quality content from multiple views.

The key aspect of the project is the integration of audio and visual inputs that support each other in reaching objectives that would otherwise be impossible using only one modality. We will focus on a set of relevant event-types: sports, music shows and crowd scenes. We will devise novel multisource processing techniques to improve audiovisual production and to enable synchronisation processing. This will in turn allow generation of novel and higher quality audio-visual rendering of captured events.

Planned Impact

This project will have major impact on the use and quality of user-generated videos. It will lead to new, intelligent content production algorithms that enable user-generated content to be seamlessly and rapidly integrated into multimedia items. It also supports the development of video-based citizen journalism and other participatory media, where members of the public play an active role in the process of news reporting and community-based content creation.

Dissemination will be through journal and conference publications as well as an up-to-date project website, which will also provide a service for uploading and producing content. We will build on existing links and engage with new beneficiaries and user groups by publishing the research at conferences (especially those with strong industry presence) and in high impact journals, visits to other groups and institutions.

Videos related to published results will also be disseminated through the Investigators' YouTube channels (that have already attracted 10,000 views to date) and websites. Our tools and accomplishments will be promoted to the multimedia signal processing community via press releases sent to mailing lists and magazines. Halfway through the project, we will host a one day Workshop on multisource signal processing, bringing together leaders in this emerging field. Near the end of the project, a Workshop on systems for intelligent production of user generated content will also be held, in which we will discuss and promote the outcomes of our research.

To enhance engagement with the public and improve communication of project results, the Queen Mary EECS Public Engagement team will advise and mentor the research team to deliver high impact public engagement activities, assist with writing and editing articles for web and magazine, integrating messages from the project in events for schools and the general public and contribute towards distribution costs of magazines including project results. We will participate in communication activities by exhibiting prototypes developed in the project, contributing our technologies for use in creative projects (c4dmpresents.org) and disseminating results through the school magazines Audio! and CS4Fun (Computer Science for Fun).

The Investigators will lead Impact-related activities, although all researchers will be involved. We will use the services and expertise available from the QMUL Communications Officer for drafting press releases and promotional materials, and from the QM Innovation Marketing Manager for producing materials aimed at industry and user groups with a commercial interest. Descriptions of the project and its outcomes will be included in promotional material from the Investigators' research groups, the Computer Vision group and the Centre for Digital Music

Funded Value:

£362,895

Funded Period:

Feb 13 - Jan 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/K007491/1

Principal Investigator:

Andrea Cavallaro

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Digital Signal Processing (40%)

Image & Vision Computing (60%)

Organisations

People	ORCID iD
Andrea Cavallaro (Principal Investigator)
Joshua Reiss (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Bano S (2015) ViComp: composition of user-generated videos in Multimedia Tools and Applications

Bano S (2015) Discovery and organization of multi-camera user-generated videos of the same event in Information Sciences

Chen F (2014) Resource Allocation for Personalized Video Summarization in IEEE Transactions on Multimedia

Hon T (2015) Audio Fingerprinting for Multi-Device Self-Localization in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Hon T (2015) Fine landmark-based synchronization of ad-hoc microphone arrays

Llagostera Casanovas A (2014) Audio-visual events for multi-camera synchronization in Multimedia Tools and Applications

Mukhutdinov D (2023) Deep Learning Models for Single-Channel Speech Enhancement on Drones in IEEE Access

Pestana P.D. (2014) A cross-adaptive dynamic spectral panning technique in Proceedings of the International Conference on Digital Audio Effects, DAFx

Wang L (2016) An Iterative Approach to Source Counting and Localization Using Two Distant Microphones in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Wang L (2020) A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Key Findings
Impact Summary
Collaboration


Description	The grant led to new models and tools for the synchronisation of audiovisual files captured by ad-hoc networks of sensors, such as smartphones; to new models and tools for the localisation of the sensors based on the collected data only; and to new models and tools for the automated processing and editing of the multi-viewpoint audiovisual recordings
Exploitation Route	A prototype system is available to user to automatically edit user-generated videos.
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software) Culture Heritage Museums and Collections Other
URL	http://www.eecs.qmul.ac.uk/~andrea/mavip


Description	Availability of prototype for users to upload their user-generated videos and to receive an automatically generated final cut (https://gifu.eecs.qmul.ac.uk). This webtool has been used for content creation and teaching purposes. A dedicated multimedia synchroniser has been developed based on the technolgy of the previous prototype (http://synch.eecs.qmul.ac.uk/) this is used to align temporally videos that are captured by independent cameras. This is to support multi-view video production.
First Year Of Impact	2015
Sector	Digital/Communication/Information Technologies (including Software),Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections
Impact Types	Cultural


Description	research collaboration
Organisation	Catholic University of Louvain
Department	Institute of Information and Communication Technologies, Electronics and Applied Mathematics
Country	Belgium
Sector	Academic/University
PI Contribution	Contributed to the design and testing of novel methods for personalised summarisation of multi-view content
Collaborator Contribution	Contributed dataset, implementation, methods for automated summarisation of multi-view content
Impact	Journal paper F. Chen, C. De Vleeschouwer, A. Cavallaro, "Resource allocation for personalized video summarization", IEEE Transactions on Multimedia, Vol. 16, Issue 2, February 2014, pp. 455-469
Start Year	2013


Description	research collaboration
Organisation	Japan Advanced Institute for Science and Technology
Country	Japan
Sector	Private
PI Contribution	Contributed to the design and testing of novel methods for personalised summarisation of multi-view content
Collaborator Contribution	Contributed dataset, implementation, methods for automated summarisation of multi-view content
Impact	Journal paper F. Chen, C. De Vleeschouwer, A. Cavallaro, "Resource allocation for personalized video summarization", IEEE Transactions on Multimedia, Vol. 16, Issue 2, February 2014, pp. 455-469
Start Year	2013


Description	research collaboration
Organisation	University of Genoa
Department	Department of Informatics, Bioengineering, Robotics and Systems Engineering
Country	Italy
Sector	Academic/University
PI Contribution	Contributed a new methods for the synchronisation of video streams based on matching articulated objects
Collaborator Contribution	Contributed a dataset and software implementation for the synchronisation of video streams based on matching articulated objects
Impact	Journal paper L. Zini, F. Odone, A. Cavallaro, "Multi-view matching of articulated objects, " IEEE Transactions on Circuits and Systems for Video Technology, Vol. 24, No. 11, November 2014, pp. 1920-1934
Start Year	2013

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications