Integrating sound and context recognition for acoustic scene analysis

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

The amount of audio data being generated has dramatically increased over the past decade, spanning from user-generated content, recordings in audiovisual archives, to sensor data captured in urban, nature or domestic environments. The need to detect and identify sound events in environmental recordings (e.g. door knock, glass break) as well as to recognise the context of an audio recording (e.g. train station, meeting) has led to the emergence of a new field of research: acoustic scene analysis. Emerging applications of acoustic scene analysis include the development of sound recognition technologies for smart homes and smart cities, security/surveillance, audio retrieval and archiving, ambient assisted living, and automatic biodiversity assessment.

However, current sound recognition technologies cannot adapt to different environments or situations (e.g. sound identification in an office environment, assuming specific room properties, working hours, outdoor noise and weather conditions). If information about context is available, it is typically characterised by a single label for an entire audio stream, not taking into account complex and ever-changing environments, for example when recording using hand-held devices, where context can consist of multiple time-varying factors and can be characterised by more than a single label.

This project will address the aforementioned shortcomings by investigating and developing technologies for context-aware sound recognition. We assume that the context of an audio stream consists of several time-varying factors that can be viewed as a combination of different environments and situations; the ever-changing context in turn informs the types and properties of sounds to be recognised by the system. Methods for context and sound recognition will be investigated and developed, based on signal processing and machine learning theory. The main contribution of the project will be an algorithmic framework that jointly recognises audio-based context and sound events, applied to complex audio streams with several sound sources and time-varying environments.

The proposed software framework will be evaluated using complex audio streams recorded in urban and domestic environments, as well as using simulated audio data in order to carefully control contextual and sound properties and have the benefit of accurate annotations. In order to further promote the study of context-aware sound recognition systems, a public evaluation task will be organised in conjunction with the public challenge on Detection and Classification of Acoustic Scenes and Events (DCASE).

Research carried out in this project targets a wide range of potential beneficiaries in the commercial and public sector for sound and audio-based context recognition technologies, as well as users and practitioners of such technologies. Beyond acoustic scene analysis, we believe this new approach will advance the broader fields of audio and acoustics, leading to the creation of context-aware systems for related fields, including music and speech technology and hearing aids.

Planned Impact

By investigating and developing technologies for sound and audio-based context recognition and by creating new datasets of sound scenes in various environments, this project holds benefits to several groups beyond the academic community, both in the UK and internationally. These include:

Commercial Private Sector:

* Internet of Things companies creating technologies and devices for smart homes and ambient assisted living, through access to sound and environment recognition technologies adaptable to various contexts.
* Audiovisual archiving companies, through technologies for automatic annotation of continuous audio streams with respect to both context and sounds present.
* Companies in the security/surveillance sector, through access to novel algorithms for sound detection and identification in complex and noisy environments.
* Acoustics and sound engineering companies, through new acoustic measurement methods for automatic environment recognition and noise measurement.
* Composers and sound artists, who will benefit from a new set of digital tools for exploring soundscapes and sounds, and using them in their creative output.

Public and third sectors:

* Libraries, museums and public archives hosting, preserving and curating sound collections, through new methods for automatic organisation, exploration and annotation of audio streams and recordings.
* Urban planning authorities and smart city developers, through new methods and tools for automated environmental and noise monitoring.
* Independent organisations promoting STEM (science, technology, engineering and maths) to young people and underrepresented groups, by using sound and audio to generate interest in STEM careers.

Wider public:

* Users of sound recognition technologies, through smart home applications or mobile devices which enable sound capture and processing.
* Users of public and private sound collections and archives, through new technologies for exploring and annotating sound data.
* Residents of urban areas, as beneficiaries of audio-based smart city technologies resulting in lower noise pollution.
* Audiences of artistic output that involves the use of audio technologies related to sound recognition and exploration.
* Teachers and students in technical and creative fields related to sound, involving the use of audio technologies or sound examples.
 
Description - Created new computational methods based on machine learning theory for joint sound event detection and sound scene classification.
- Created a new, public-facing open set taxonomy for sound scenes and sound events.
- Initiated new collaborations with several academic institutions (NUS, Ecole Centrale de Nantes, Tampere University, Marche Polytechnic University).
Exploitation Route Open set taxonomy can be used by other researchers and practitioners working in computational sound scene analysis. Methods for joint sound scene and sound event analysis, and can be used by other researchers in computational sound scene analysis. Multi-task methods for sound recognition and sound activity detection can be used by researchers working on sound recognition and smart homes. Methods for sound scene recognition and city classification can be used by researchers working on audio context recognition, security/surveillance, and audio-based mobile applications.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://soundscape.eecs.qmul.ac.uk/
 
Description Methods proposed as part of this project towards creating an open-set taxonomy for sound events and sound scenes have been used towards implementing a public-facing web tool for creating a consistent taxonomy between previously mis-matched audio datasets. This work has been supported through a Flexible Innovation Starter Award through QMUL's Impact Acceleration account, awarded to the project's PDRA (Dr Helen Bear), and supported by Audio Analytic Ltd.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description QMUL-EPSRC Flexible Innovation Starter Award
Amount £4,717 (GBP)
Organisation Queen Mary University of London 
Sector Academic/University
Country United Kingdom
Start 12/2018 
End 02/2019
 
Description Unsupervised detection of sound events for complex audio
Amount £3,800 (GBP)
Funding ID IEC\NSFC\201382 
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2021 
End 03/2023
 
Title The extensible taxonomy 
Description Online tool for maintaining an open set taxonomy of sound scenes and sound events. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Work supported by QMUL-EPSRC Impact Acceleration account, and supported by Audio Analytic Ltd. 
URL http://soundscape.eecs.qmul.ac.uk/the-extensible-taxonomy/live/
 
Title Audio-Based identification of Beehive states: The dataset 
Description Annotated dataset for audio-based identification of sound states. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact No impact recorded as of yet. 
URL https://zenodo.org/record/2563940#.XHUjFYXgryt
 
Title Joint sound scene and event dataset 
Description A dataset of synthentic sound scenes created using the Scaper sound scene simulator using real-world recordings as input. Synthesized scenes are split into train and test sets such that no original recordings are present in both splits. Dataset was created for the task of jointly performing acoustic scene classification and sound event detection. Time stamped annotations and JAMS files are included. Dataset contains 10 acoustic scene classes and 32 sound event classes. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact None recorded as of yet. 
URL https://zenodo.org/record/2565309#.Xj6XjOGnzm0
 
Title To bee or not to bee: An annotated dataset for beehive sound recognition 
Description The present dataset was developed in the context of our work that focuses on the automatic recognition of beehive sounds. The problem is posed as the classification of sound segments in two classes: Bee and noBee. The novelty of the explored approach and the need for annotated data, dictated the construction of such dataset. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact No recorded impact as of yet. 
URL https://zenodo.org/record/1321278#.XHUio4Xgryt
 
Description Collaboration with Audio Research Group, Tampere University of Technology, Finland 
Organisation Tampere University of Technology
Country Finland 
Sector Academic/University 
PI Contribution Collaboration in computational sound scene analysis research with the Audio Research Group, Tampere University of Technology, Finland (host: Prof Tuomas Virtanen).
Collaborator Contribution Planned 1-month research visit by PDRA Helen Bear at TUT, Finland, towards a research collaboration in joint sound event detection and sound scene classification.
Impact One published paper at IEEE WASPAA 2019 workshop, entitled: "City classification from multiple real-world sound scenes".
Start Year 2019
 
Description Collaboration with Department of Information Engineering, Universita Politecnica delle Marche, Italy 
Organisation Marche Polytechnic University
Country Italy 
Sector Academic/University 
PI Contribution Research collaboration towards audio-based identification of beehive sounds and on automatic recognition of beehive states. QMUL team contributed towards research and development of methods for the above tasks, as well as on dataset annotation.
Collaborator Contribution Partners contributed towards research and development of methods for the above tasks, as well as on dataset annotation. Partners also hosted project RA for a 1-week research visit.
Impact Paper published at IEEE ICASSP 2019 conference: Audio-based identification of beehive states
Start Year 2018
 
Description Collaboration with LS2N, Ecole Centrale de Nantes, France 
Organisation Ecole Centrale de Nantes
Country France 
Sector Academic/University 
PI Contribution Collaboration in terms of audio data collection, for the purpose of organising a public evaluation task for urban sound characterisation as part of the DCASE Challenge series.
Collaborator Contribution Hosted a visiting PhD student from LS2N, Ecole Centrale de Nantes, France, at QMUL and collaborated towards audio data collection and challenge task specification.
Impact No outputs as of yet.
Start Year 2018
 
Description Collaboration with LS2N, Ecole Centrale de Nantes, France 
Organisation National Center for Scientific Research (Centre National de la Recherche Scientifique CNRS)
Country France 
Sector Academic/University 
PI Contribution Collaboration in terms of audio data collection, for the purpose of organising a public evaluation task for urban sound characterisation as part of the DCASE Challenge series.
Collaborator Contribution Hosted a visiting PhD student from LS2N, Ecole Centrale de Nantes, France, at QMUL and collaborated towards audio data collection and challenge task specification.
Impact No outputs as of yet.
Start Year 2018
 
Description Collaboration with Sound & Music Computing Lab, National University of Singapore 
Organisation National University of Singapore
Country Singapore 
Sector Academic/University 
PI Contribution Research collaboration in computational sound scene analysis with the Sound & Music Computing Lab, Computer Science Department, National University of Singapore (lead collaborator: Prof Ye Wang). This collaboration focuses on research for sound event detection and sound scene classification using machine learning models.
Collaborator Contribution Research collaboration includes 1 academic, 1 PhD student and 1 research intern at NUS. Collaboration was supported by hosting a research visit made by the PI at NUS in January 2019, and by a research visit made by Prof Ye Wang at QMUL in August-September 2019.
Impact One published paper at the IEEE ICASSP 2019 conference: SubSpectralNet - Using sub-spectrogram based convolutional neural networks for acoustic scene classification, and one accepted paper at the IEEE ICASSP 2020 conference: A-CRNN: a domain adaptation model for sound event detection.
Start Year 2018
 
Title Audio-based identification of beehive states 
Description The sound produced by bees in a beehive can be a source of many insights regarding the overall state, health conditions, and can even indicate natural phenomena related to the natural life cycle of the hive. The goal of this software is to create a system that can automatically identify different states of a hive based on audio recordings made from inside beehives. The proposed pipeline consists in a sequence of two classifiers: The first works as a preselector of relevant samples, like a cleaning system to remove any non bee sound. The selected samples are then fed to the second classifiers wich makes the decision regarding the state of the hive. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact No impact recorded as of yet. 
URL https://github.com/madzimia/Audio_based_identification_beehive_states
 
Title City classification from multiple real-world sound scenes 
Description Code for associated paper published as IEEE WASPAA 2019 entitled "City classification from multiple real-world sound scenes". Paper link: https://doi.org/10.1109/WASPAA.2019.8937271 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact None as of yet. 
URL https://github.com/drylbear/soundscapeCityClassification
 
Title SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification 
Description This repository contains the Keras/TensorFlow implementation and miscellaneous figures for SubSpectralNets introduced in the following paper: "SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification" (Accepted in ICASSP 2019). We introduce a novel approach of using spectrograms in Convolutional Neural Networks in the context of acoustic scene classification. First, we show from the statistical analysis that some specific bands of mel-spectrograms carry discriminative information than other bands, which is specific to every soundscape. From the inferences taken by this, we propose SubSpectralNets in which we first design a new convolutional layer that splits the time-frequency features into sub-spectrograms, then merges the band-level features on a later stage for the global classification. The effectiveness of SubSpectralNet is demonstrated by a relative improvement of +14% accuracy over the DCASE 2018 baseline model. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact No impact recorded as of yet. 
URL https://github.com/ssrp/SubSpectralNet
 
Title Towards joint sound scene and polyphonic sound event recognition 
Description Code related to INTERSPEECH 2019 paper for joint sound scene and polyphonic sound event recognition. Associated paper: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2169.pdf 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact None as of yet. 
URL https://github.com/drylbear/jointASCandSED