Musical Audio Repurposing using Source Separation

Lead Research Organisation: University of Surrey

Department Name: Vision Speech and Signal Proc CVSSP

Abstract

Delivery of audio has become increasingly complex: originally in single channel (mono) or 2-channel stereo format, now surround sound in "5.1" format (5 main speakers plus one low frequency effects channel) is available in many home cinema systems, and many other multichannel audio formats are available (e.g. 6.1, 7.1, 10.2 and 22.2). In addition, new interactive apps allow users to remix musical audio, changing instrument volumes, and music games allow players to control individual instruments. Content creators therefore have to develop new ways to create and distribute their audio content to allow their content to be played back on these multichannel systems, or remixed by users to suit their own tastes.
However, much audio content is still in legacy formats, mainly 2-channel stereo. We therefore need ways to "repurpose" this legacy audio content, converting these into surround sound or to the separate "stems" needed for remixable audio.
The aim of this project is to develop a new approach to high quality audio repurposing, based on high quality musical audio source separation. To achieve this we will combine new high resolution separation techniques with information such as musical scores, instrument recognition, onset detection, and pitch tracking. Instead of aiming at generic source separation, we will develop algorithms designed to match the separation performance to the final target (upmixing or remixing). In parallel, we will investigate perceptual evaluation measures for source separation, remixing and upmixing, and develop new diagnostic evaluation techniques tailored to measure different aspects of the repurposed outcome.
The outcomes of this project will allow music consumers to enjoy their favourite songs in interactive remixing apps and games, even where the original separate "stems" are not available. It will also allow music companies, broadcasters and sound archive holders to provide high quality upmixed versions of their large archive content, for an increasing generation of listeners with surround sound systems in the home.

Planned Impact

(Non-academic beneficiaries are outlined here and in "Pathways to Impact". For more on academic impact, see "Academic Beneficiaries" and the "Academic Impact" section in the Case for Support.)
Audio researchers in industry will benefit from new methods for upmixing and remixing emerging from the project.
Manufacturers of audio upmixing equipment and plugins, and broadcasters wishing to upmix legacy 2-channel stereo content, will benefit from our new high-quality upmixing methods. Manufacturers of other musical audio effects boxes will benefit from new methods for remixing allowing repurposing of legacy audio content.
Other holders of legacy audio and audiovisual archives, such as the British Library, BFI and regional sound archives, will benefit from the ability to upmix their content for modern audiences becoming increasingly used to surround sound audio.
There is a strong interest amongst both professional and high-end consumer audio users in new methods for unmixing 2-channel stereo content to 5.1 surround sound, leading to a range of upmix (or "unwrap") plugins for systems such as ProTools. These users will benefit from new upmix approaches emerging from this project, either through direct use of research prototypes, or through enhanced software or tools from audio equipment or plugin manufacturers.
Sound artists and composers will benefit from our remixing methods, allowing them to use sounds from mixed audio signals as part of compositions.
Remixing apps are becoming available for mobile devices, allowing users to remix and share audio tracks. Currently these are limited to use tracks where the separated sources are available from the original music label. These remix users and companies would benefit from the ability to remix from the stereo content that they already own.
The staff employed on the project, including postdoctoral research assistants undertaking the research, will gain skills applicable to industrial problems such as advanced digital signal processing, research software development, and evaluation methodologies.

Funded Value:

£856,793

Funded Period:

Apr 15 - Oct 18

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/L027119/2

Principal Investigator:

Mark Plumbley

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Music & Acoustic Technology (100%)

Organisations

University of Surrey (Lead Research Organisation)

People	ORCID iD
Mark Plumbley (Principal Investigator)
Martin Dewhirst (Co-Investigator)
Christopher Hummersone (Co-Investigator)
Panos Kudumakis (Co-Investigator)	http://orcid.org/0000-0003-0518-4198
Wenwu Wang (Co-Investigator)
Joshua Reiss (Co-Investigator)
Simon Dixon (Co-Investigator)	http://orcid.org/0000-0002-6098-481X
Philip J B Jackson (Co-Investigator)
Mark Sandler (Co-Investigator)
Nick Bryan-Kinns (Co-Investigator)	http://orcid.org/0000-0002-1382-2914
Philip Coleman (Co-Investigator)
Russell Mason (Co-Investigator)
Chris Cannam (Researcher Co-Investigator)
Sebastian Ewert (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Grais E.M. (2018) Combining fully convolutional and recurrent neural networks for single channel audio source separation in 144th Audio Engineering Society Convention 2018

Grais EM (2016) Single-channel audio source separation using deep neural network ensembles

Hamon R (2017) Assessment of musical noise using localization of isolated peaks in time-frequency domain

Kim C. (2018) Perception of phase changes in the context of musical audio source separation in 145th Audio Engineering Society International Convention, AES 2018

O'Hanlon K (2015) Non-negative matrix factorisation incorporating greedy Hellinger sparse coding applied to polyphonic music transcription

Rencker L (2017) A greedy algorithm with learned statistics for sparse signal reconstruction

Rencker L (2017) Multivariate iterative hard thresholding for sparse decomposition with flexible sparsity patterns

Roma G (2016) Singing Voice Separation Using Deep Neural Networks and F0 Estimation

Roma G (2016) Untwist: A new toolbox for audio source separation

Roma G (2016) Music remixing and upmixing using source separation

Key Findings
Impact Summary
Further Funding
Software and Technical Products
Engagement Activities


Description	New methods for audio source separation using deep neural networks. New methods for remixing and upmixing musical audio using audio source separation. New insights into perceptual evaluation of audio source separation methods.
Exploitation Route	Potential applications for audio upmixing for spatial audio, or remixing in audio production or games or for remixing apps.
Sectors	Creative Economy,Culture, Heritage, Museums and Collections
URL	https://cvssp.github.io/maruss-website/


Description	Assisted by one of the past postdoc researchers, the company audEERING GmBH has used some of the project results. They have used our results on the validity of BSS eval and the proof-of-concept of reference free prediction for evaluating our source separation and speech enhancement algorithms.
First Year Of Impact	2019
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	AI for Sound
Amount	£2,120,276 (GBP)
Funding ID	EP/T019751/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	05/2020
End	04/2025


Description	Audio-Visual Media Research Platform
Amount	£1,577,223 (GBP)
Funding ID	EP/P022529/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	08/2017
End	07/2022


Description	EPSRC UK Acoustics Network Plus
Amount	£1,418,894 (GBP)
Funding ID	EP/V007866/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	11/2020
End	10/2024


Description	H2020-ICT-2015 Audio Commons
Amount	€ 2,980,000 (EUR)
Funding ID	688382
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	02/2016
End	01/2019


Description	Multimodal Video Search by Examples (MVSE)
Amount	£863,564 (GBP)
Funding ID	EP/V002856/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	09/2020
End	09/2023


Description	UK Acoustics Network
Amount	£561,807 (GBP)
Funding ID	EP/R005001/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	11/2017
End	11/2020


Title	Two-way mixer
Description	TwoWayMixer is a demonstration prototype that showcases the utilization of Deep Neural Networks to audio remixing through source separation, implemented using web audio technology.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	The demo allows researchers and practitioners to understand the potential of source separation algorithms for audio remixing, and contributes to the emerging field of Web Audio. It was presented in the Second Web Audio Conference (WAC 2016).
URL	https://github.com/g-roma/twowaymixer


Title	untwist
Description	Untwist is python library for audio source separation. It provides a self-contained object-oriented framework including common source separation algorithms as well as input/output functions, data management utilities and time-frequency transforms.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	The development of library has enabled research on audio source separation within the Musical Audio Repurposing using Source Separation project. By releasing as open source, we help researchers in the field of signal processing and source separation to transition to the Python programming language, and encourage good practices in research software engineering. The library was presented at the 9th European Conference on Python in Science (EuroScipy 2016).
URL	https://github.com/IoSR-Surrey/untwist


Description	14th International Conference on Latent Variable Analysis and Signal Separation (LVA-ICA 2018), University of Surrey, Guildford, UK
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The international conference on Latent Variable Analysis and Signal Separation, LVA/ICA, is an interdisciplinary forum where researchers and practitioners can experience a broad range of exciting theories and applications involving signal processing, applied statistics, machine learning, linear and multilinear algebra, numerical analysis and optimization, and other areas targeting Latent Variable Analysis problems. The 14th LVA/ICA was held at the University of Surrey, Guildford, UK from the 2nd to the 6th of July 2018.
Year(s) Of Engagement Activity	2018
URL	https://cvssp.org/events/lva-ica-2018/


Description	AES Convention Milan Workshop and Panel
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	The project organised/chaired a workshop on Audio Repurposing using Source Separation at the AES Convention in Milan, May 2018. Representatives from the project, as well as invited panellists from BBC R&D, Fraunhofer IDMT and Fraunhofer IIS, presented about the state of the art in source separation. This was then followed by a panel discussion about future directions. The aims of the workshop, building on the idea of using state-of-the-art source separation in the real world, were fourfold: 1. Give an overview of the state of the art in source separation and how it sounds; 2. Discuss and demonstrate current applications of source separation in audio production; 3. Discuss evaluation of source separation in applied contexts; 4. Give perspectives on future research directions and applications of source separation. The workshop had lots of audio examples and the audience were generally optimistic about the role of source separation in professional audio contexts. The talks sparked interesting questions. We were invited to repeat the workshop at the New York convention in October 2018.
Year(s) Of Engagement Activity	2018
URL	http://philipcolemanaudio.wordpress.com/2018/06/19/aes-milan-workshop-on-audio-repurposing-using-sou...


Description	Audio Day 6 July 2018
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This Audio Day brought together researchers and collaborators engaged in audio-related research projects linked to the University of Surrey.
Year(s) Of Engagement Activity	2018
URL	https://cvssp.org/events/audio_day_2018/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications