Musical Audio Repurposing using Source Separation

Lead Research Organisation: Queen Mary University of London

Department Name: Sch of Electronic Eng & Computer Science

Abstract

Delivery of audio has become increasingly complex: originally in single channel (mono) or 2-channel stereo format, now surround sound in "5.1" format (5 main speakers plus one low frequency effects channel) is available in many home cinema systems, and many other multichannel audio formats are available (e.g. 6.1, 7.1, 10.2 and 22.2). In addition, new interactive apps allow users to remix musical audio, changing instrument volumes, and music games allow players to control individual instruments. Content creators therefore have to develop new ways to create and distribute their audio content to allow their content to be played back on these multichannel systems, or remixed by users to suit their own tastes.
However, much audio content is still in legacy formats, mainly 2-channel stereo. We therefore need ways to "repurpose" this legacy audio content, converting these into surround sound or to the separate "stems" needed for remixable audio.
The aim of this project is to develop a new approach to high quality audio repurposing, based on high quality musical audio source separation. To achieve this we will combine new high resolution separation techniques with information such as musical scores, instrument recognition, onset detection, and pitch tracking. Instead of aiming at generic source separation, we will develop algorithms designed to match the separation performance to the final target (upmixing or remixing). In parallel, we will investigate perceptual evaluation measures for source separation, remixing and upmixing, and develop new diagnostic evaluation techniques tailored to measure different aspects of the repurposed outcome.
The outcomes of this project will allow music consumers to enjoy their favourite songs in interactive remixing apps and games, even where the original separate "stems" are not available. It will also allow music companies, broadcasters and sound archive holders to provide high quality upmixed versions of their large archive content, for an increasing generation of listeners with surround sound systems in the home.

Planned Impact

(Non-academic beneficiaries are outlined here and in "Pathways to Impact". For more on academic impact, see "Academic Beneficiaries" and the "Academic Impact" section in the Case for Support.)
Audio researchers in industry will benefit from new methods for upmixing and remixing emerging from the project.
Manufacturers of audio upmixing equipment and plugins, and broadcasters wishing to upmix legacy 2-channel stereo content, will benefit from our new high-quality upmixing methods. Manufacturers of other musical audio effects boxes will benefit from new methods for remixing allowing repurposing of legacy audio content.
Other holders of legacy audio and audiovisual archives, such as the British Library, BFI and regional sound archives, will benefit from the ability to upmix their content for modern audiences becoming increasingly used to surround sound audio.
There is a strong interest amongst both professional and high-end consumer audio users in new methods for unmixing 2-channel stereo content to 5.1 surround sound, leading to a range of upmix (or "unwrap") plugins for systems such as ProTools. These users will benefit from new upmix approaches emerging from this project, either through direct use of research prototypes, or through enhanced software or tools from audio equipment or plugin manufacturers.
Sound artists and composers will benefit from our remixing methods, allowing them to use sounds from mixed audio signals as part of compositions.
Remixing apps are becoming available for mobile devices, allowing users to remix and share audio tracks. Currently these are limited to use tracks where the separated sources are available from the original music label. These remix users and companies would benefit from the ability to remix from the stereo content that they already own.
The staff employed on the project, including postdoctoral research assistants undertaking the research, will gain skills applicable to industrial problems such as advanced digital signal processing, research software development, and evaluation methodologies.

Funded Value:

£887,606

Funded Period:

Nov 14 - Dec 14

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/L027119/1

Principal Investigator:

Mark Plumbley

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Music & Acoustic Technology (100%)

Organisations

People	ORCID iD
Mark Plumbley (Principal Investigator)
Martin Dewhirst (Co-Investigator)
Panos Kudumakis (Co-Investigator)	http://orcid.org/0000-0003-0518-4198
Christopher Hummersone (Co-Investigator)
Wenwu Wang (Co-Investigator)
Joshua Reiss (Co-Investigator)
Simon Dixon (Co-Investigator)	http://orcid.org/0000-0002-6098-481X
Philip J B Jackson (Co-Investigator)
Mark Sandler (Co-Investigator)
Nick Bryan-Kinns (Co-Investigator)	http://orcid.org/0000-0002-1382-2914
Philip Coleman (Co-Investigator)
Russell Mason (Co-Investigator)
Chris Cannam (Researcher Co-Investigator)
Sebastian Ewert (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Benetos E (2017) Polyphonic Sound Event Tracking Using Linear Dynamical Systems in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Benetos E (2016) Detection of overlapping acoustic events using a temporally-constrained probabilistic model

Grais E (2017) Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Grais E (2016) Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks

Grais E (2017) Latent Variable Analysis and Signal Separation

Grais EM (2016) Single-channel audio source separation using deep neural network ensembles

O'Hanlon K (2016) Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription in IEEE/ACM Transactions on Audio, Speech, and Language Processing

O'Hanlon K (2015) Non-negative matrix factorisation incorporating greedy Hellinger sparse coding applied to polyphonic music transcription

Roma G (2016) Singing Voice Separation Using Deep Neural Networks and F0 Estimation

Roma G (2016) Untwist: A new toolbox for audio source separation

Key Findings
Impact Summary
Further Funding


Description	Please see the Key Findings of EPSRC Grant: EP/L027119/2
Exploitation Route	Please see the Key Findings of EPSRC Grant: EP/L027119/2
Sectors	Creative Economy
URL	https://cvssp.github.io/maruss-website/


Description	'Please see the Impact Summary of EPSRC Grant: EP/L027119/2


Description	H2020-ICT-2015 Audio Commons
Amount	€ 2,980,000 (EUR)
Funding ID	688382
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	02/2016
End	01/2019

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications