Integrating sound and context recognition for acoustic scene analysis
Lead Research Organisation:
Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science
Abstract
The amount of audio data being generated has dramatically increased over the past decade, spanning from user-generated content, recordings in audiovisual archives, to sensor data captured in urban, nature or domestic environments. The need to detect and identify sound events in environmental recordings (e.g. door knock, glass break) as well as to recognise the context of an audio recording (e.g. train station, meeting) has led to the emergence of a new field of research: acoustic scene analysis. Emerging applications of acoustic scene analysis include the development of sound recognition technologies for smart homes and smart cities, security/surveillance, audio retrieval and archiving, ambient assisted living, and automatic biodiversity assessment.
However, current sound recognition technologies cannot adapt to different environments or situations (e.g. sound identification in an office environment, assuming specific room properties, working hours, outdoor noise and weather conditions). If information about context is available, it is typically characterised by a single label for an entire audio stream, not taking into account complex and ever-changing environments, for example when recording using hand-held devices, where context can consist of multiple time-varying factors and can be characterised by more than a single label.
This project will address the aforementioned shortcomings by investigating and developing technologies for context-aware sound recognition. We assume that the context of an audio stream consists of several time-varying factors that can be viewed as a combination of different environments and situations; the ever-changing context in turn informs the types and properties of sounds to be recognised by the system. Methods for context and sound recognition will be investigated and developed, based on signal processing and machine learning theory. The main contribution of the project will be an algorithmic framework that jointly recognises audio-based context and sound events, applied to complex audio streams with several sound sources and time-varying environments.
The proposed software framework will be evaluated using complex audio streams recorded in urban and domestic environments, as well as using simulated audio data in order to carefully control contextual and sound properties and have the benefit of accurate annotations. In order to further promote the study of context-aware sound recognition systems, a public evaluation task will be organised in conjunction with the public challenge on Detection and Classification of Acoustic Scenes and Events (DCASE).
Research carried out in this project targets a wide range of potential beneficiaries in the commercial and public sector for sound and audio-based context recognition technologies, as well as users and practitioners of such technologies. Beyond acoustic scene analysis, we believe this new approach will advance the broader fields of audio and acoustics, leading to the creation of context-aware systems for related fields, including music and speech technology and hearing aids.
However, current sound recognition technologies cannot adapt to different environments or situations (e.g. sound identification in an office environment, assuming specific room properties, working hours, outdoor noise and weather conditions). If information about context is available, it is typically characterised by a single label for an entire audio stream, not taking into account complex and ever-changing environments, for example when recording using hand-held devices, where context can consist of multiple time-varying factors and can be characterised by more than a single label.
This project will address the aforementioned shortcomings by investigating and developing technologies for context-aware sound recognition. We assume that the context of an audio stream consists of several time-varying factors that can be viewed as a combination of different environments and situations; the ever-changing context in turn informs the types and properties of sounds to be recognised by the system. Methods for context and sound recognition will be investigated and developed, based on signal processing and machine learning theory. The main contribution of the project will be an algorithmic framework that jointly recognises audio-based context and sound events, applied to complex audio streams with several sound sources and time-varying environments.
The proposed software framework will be evaluated using complex audio streams recorded in urban and domestic environments, as well as using simulated audio data in order to carefully control contextual and sound properties and have the benefit of accurate annotations. In order to further promote the study of context-aware sound recognition systems, a public evaluation task will be organised in conjunction with the public challenge on Detection and Classification of Acoustic Scenes and Events (DCASE).
Research carried out in this project targets a wide range of potential beneficiaries in the commercial and public sector for sound and audio-based context recognition technologies, as well as users and practitioners of such technologies. Beyond acoustic scene analysis, we believe this new approach will advance the broader fields of audio and acoustics, leading to the creation of context-aware systems for related fields, including music and speech technology and hearing aids.
Planned Impact
By investigating and developing technologies for sound and audio-based context recognition and by creating new datasets of sound scenes in various environments, this project holds benefits to several groups beyond the academic community, both in the UK and internationally. These include:
Commercial Private Sector:
* Internet of Things companies creating technologies and devices for smart homes and ambient assisted living, through access to sound and environment recognition technologies adaptable to various contexts.
* Audiovisual archiving companies, through technologies for automatic annotation of continuous audio streams with respect to both context and sounds present.
* Companies in the security/surveillance sector, through access to novel algorithms for sound detection and identification in complex and noisy environments.
* Acoustics and sound engineering companies, through new acoustic measurement methods for automatic environment recognition and noise measurement.
* Composers and sound artists, who will benefit from a new set of digital tools for exploring soundscapes and sounds, and using them in their creative output.
Public and third sectors:
* Libraries, museums and public archives hosting, preserving and curating sound collections, through new methods for automatic organisation, exploration and annotation of audio streams and recordings.
* Urban planning authorities and smart city developers, through new methods and tools for automated environmental and noise monitoring.
* Independent organisations promoting STEM (science, technology, engineering and maths) to young people and underrepresented groups, by using sound and audio to generate interest in STEM careers.
Wider public:
* Users of sound recognition technologies, through smart home applications or mobile devices which enable sound capture and processing.
* Users of public and private sound collections and archives, through new technologies for exploring and annotating sound data.
* Residents of urban areas, as beneficiaries of audio-based smart city technologies resulting in lower noise pollution.
* Audiences of artistic output that involves the use of audio technologies related to sound recognition and exploration.
* Teachers and students in technical and creative fields related to sound, involving the use of audio technologies or sound examples.
Commercial Private Sector:
* Internet of Things companies creating technologies and devices for smart homes and ambient assisted living, through access to sound and environment recognition technologies adaptable to various contexts.
* Audiovisual archiving companies, through technologies for automatic annotation of continuous audio streams with respect to both context and sounds present.
* Companies in the security/surveillance sector, through access to novel algorithms for sound detection and identification in complex and noisy environments.
* Acoustics and sound engineering companies, through new acoustic measurement methods for automatic environment recognition and noise measurement.
* Composers and sound artists, who will benefit from a new set of digital tools for exploring soundscapes and sounds, and using them in their creative output.
Public and third sectors:
* Libraries, museums and public archives hosting, preserving and curating sound collections, through new methods for automatic organisation, exploration and annotation of audio streams and recordings.
* Urban planning authorities and smart city developers, through new methods and tools for automated environmental and noise monitoring.
* Independent organisations promoting STEM (science, technology, engineering and maths) to young people and underrepresented groups, by using sound and audio to generate interest in STEM careers.
Wider public:
* Users of sound recognition technologies, through smart home applications or mobile devices which enable sound capture and processing.
* Users of public and private sound collections and archives, through new technologies for exploring and annotating sound data.
* Residents of urban areas, as beneficiaries of audio-based smart city technologies resulting in lower noise pollution.
* Audiences of artistic output that involves the use of audio technologies related to sound recognition and exploration.
* Teachers and students in technical and creative fields related to sound, involving the use of audio technologies or sound examples.
Organisations
- Queen Mary University of London (Lead Research Organisation)
- National Center for Scientific Research (Centre National de la Recherche Scientifique CNRS) (Collaboration)
- Ecole Centrale de Nantes (Collaboration)
- Tampere University of Technology (Collaboration)
- National University of Singapore (Collaboration)
- Marche Polytechnic University (Collaboration)
Publications
Nolasco I
(2019)
Audio-based Identification of Beehive States
Nolasco I
(2018)
Audio-based identification of beehive states
Pankajakshan A
(2019)
Onsets, Activity, and Events: A Multi-task Approach for Polyphonic Sound Event Modelling
Pankajakshan A
(2019)
Polyphonic Sound Event and Sound Activity Detection: A Multi-Task Approach
Pankajakshan A
(2019)
Polyphonic Sound Event and Sound Activity Detection: A Multi-task approach
A Pankajakshan
(2024)
Sound event detection by exploring audio sequence modelling
Description | - Created new computational methods based on machine learning theory for joint sound event detection and sound scene classification. - Created a new, public-facing open set taxonomy for sound scenes and sound events. - Initiated new collaborations with several academic institutions (NUS, Ecole Centrale de Nantes, Tampere University, Marche Polytechnic University). |
Exploitation Route | Open set taxonomy can be used by other researchers and practitioners working in computational sound scene analysis. Methods for joint sound scene and sound event analysis, and can be used by other researchers in computational sound scene analysis. Multi-task methods for sound recognition and sound activity detection can be used by researchers working on sound recognition and smart homes. Methods for sound scene recognition and city classification can be used by researchers working on audio context recognition, security/surveillance, and audio-based mobile applications. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | http://soundscape.eecs.qmul.ac.uk/ |
Description | Methods proposed as part of this project towards creating an open-set taxonomy for sound events and sound scenes have been used towards implementing a public-facing web tool for creating a consistent taxonomy between previously mis-matched audio datasets. This work has been supported through a Flexible Innovation Starter Award through QMUL's Impact Acceleration account, awarded to the project's PDRA (Dr Helen Bear), and supported by Audio Analytic Ltd. |
First Year Of Impact | 2018 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Description | QMUL-EPSRC Flexible Innovation Starter Award |
Amount | £4,717 (GBP) |
Organisation | Queen Mary University of London |
Sector | Academic/University |
Country | United Kingdom |
Start | 12/2018 |
End | 02/2019 |
Description | Unsupervised detection of sound events for complex audio |
Amount | £3,800 (GBP) |
Funding ID | IEC\NSFC\201382 |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2021 |
End | 03/2023 |
Title | The extensible taxonomy |
Description | Online tool for maintaining an open set taxonomy of sound scenes and sound events. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | Work supported by QMUL-EPSRC Impact Acceleration account, and supported by Audio Analytic Ltd. |
URL | http://soundscape.eecs.qmul.ac.uk/the-extensible-taxonomy/live/ |
Title | Audio-Based identification of Beehive states: The dataset |
Description | Annotated dataset for audio-based identification of sound states. |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | No impact recorded as of yet. |
URL | https://zenodo.org/record/2563940#.XHUjFYXgryt |
Title | Joint sound scene and event dataset |
Description | A dataset of synthentic sound scenes created using the Scaper sound scene simulator using real-world recordings as input. Synthesized scenes are split into train and test sets such that no original recordings are present in both splits. Dataset was created for the task of jointly performing acoustic scene classification and sound event detection. Time stamped annotations and JAMS files are included. Dataset contains 10 acoustic scene classes and 32 sound event classes. |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | None recorded as of yet. |
URL | https://zenodo.org/record/2565309#.Xj6XjOGnzm0 |
Title | To bee or not to bee: An annotated dataset for beehive sound recognition |
Description | The present dataset was developed in the context of our work that focuses on the automatic recognition of beehive sounds. The problem is posed as the classification of sound segments in two classes: Bee and noBee. The novelty of the explored approach and the need for annotated data, dictated the construction of such dataset. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | No recorded impact as of yet. |
URL | https://zenodo.org/record/1321278#.XHUio4Xgryt |
Description | Collaboration with Audio Research Group, Tampere University of Technology, Finland |
Organisation | Tampere University of Technology |
Country | Finland |
Sector | Academic/University |
PI Contribution | Collaboration in computational sound scene analysis research with the Audio Research Group, Tampere University of Technology, Finland (host: Prof Tuomas Virtanen). |
Collaborator Contribution | Planned 1-month research visit by PDRA Helen Bear at TUT, Finland, towards a research collaboration in joint sound event detection and sound scene classification. |
Impact | One published paper at IEEE WASPAA 2019 workshop, entitled: "City classification from multiple real-world sound scenes". |
Start Year | 2019 |
Description | Collaboration with Department of Information Engineering, Universita Politecnica delle Marche, Italy |
Organisation | Marche Polytechnic University |
Country | Italy |
Sector | Academic/University |
PI Contribution | Research collaboration towards audio-based identification of beehive sounds and on automatic recognition of beehive states. QMUL team contributed towards research and development of methods for the above tasks, as well as on dataset annotation. |
Collaborator Contribution | Partners contributed towards research and development of methods for the above tasks, as well as on dataset annotation. Partners also hosted project RA for a 1-week research visit. |
Impact | Paper published at IEEE ICASSP 2019 conference: Audio-based identification of beehive states |
Start Year | 2018 |
Description | Collaboration with LS2N, Ecole Centrale de Nantes, France |
Organisation | Ecole Centrale de Nantes |
Country | France |
Sector | Academic/University |
PI Contribution | Collaboration in terms of audio data collection, for the purpose of organising a public evaluation task for urban sound characterisation as part of the DCASE Challenge series. |
Collaborator Contribution | Hosted a visiting PhD student from LS2N, Ecole Centrale de Nantes, France, at QMUL and collaborated towards audio data collection and challenge task specification. |
Impact | No outputs as of yet. |
Start Year | 2018 |
Description | Collaboration with LS2N, Ecole Centrale de Nantes, France |
Organisation | National Center for Scientific Research (Centre National de la Recherche Scientifique CNRS) |
Country | France |
Sector | Academic/University |
PI Contribution | Collaboration in terms of audio data collection, for the purpose of organising a public evaluation task for urban sound characterisation as part of the DCASE Challenge series. |
Collaborator Contribution | Hosted a visiting PhD student from LS2N, Ecole Centrale de Nantes, France, at QMUL and collaborated towards audio data collection and challenge task specification. |
Impact | No outputs as of yet. |
Start Year | 2018 |
Description | Collaboration with Sound & Music Computing Lab, National University of Singapore |
Organisation | National University of Singapore |
Country | Singapore |
Sector | Academic/University |
PI Contribution | Research collaboration in computational sound scene analysis with the Sound & Music Computing Lab, Computer Science Department, National University of Singapore (lead collaborator: Prof Ye Wang). This collaboration focuses on research for sound event detection and sound scene classification using machine learning models. |
Collaborator Contribution | Research collaboration includes 1 academic, 1 PhD student and 1 research intern at NUS. Collaboration was supported by hosting a research visit made by the PI at NUS in January 2019, and by a research visit made by Prof Ye Wang at QMUL in August-September 2019. |
Impact | One published paper at the IEEE ICASSP 2019 conference: SubSpectralNet - Using sub-spectrogram based convolutional neural networks for acoustic scene classification, and one accepted paper at the IEEE ICASSP 2020 conference: A-CRNN: a domain adaptation model for sound event detection. |
Start Year | 2018 |
Title | Audio-based identification of beehive states |
Description | The sound produced by bees in a beehive can be a source of many insights regarding the overall state, health conditions, and can even indicate natural phenomena related to the natural life cycle of the hive. The goal of this software is to create a system that can automatically identify different states of a hive based on audio recordings made from inside beehives. The proposed pipeline consists in a sequence of two classifiers: The first works as a preselector of relevant samples, like a cleaning system to remove any non bee sound. The selected samples are then fed to the second classifiers wich makes the decision regarding the state of the hive. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | No impact recorded as of yet. |
URL | https://github.com/madzimia/Audio_based_identification_beehive_states |
Title | City classification from multiple real-world sound scenes |
Description | Code for associated paper published as IEEE WASPAA 2019 entitled "City classification from multiple real-world sound scenes". Paper link: https://doi.org/10.1109/WASPAA.2019.8937271 |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | None as of yet. |
URL | https://github.com/drylbear/soundscapeCityClassification |
Title | SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification |
Description | This repository contains the Keras/TensorFlow implementation and miscellaneous figures for SubSpectralNets introduced in the following paper: "SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification" (Accepted in ICASSP 2019). We introduce a novel approach of using spectrograms in Convolutional Neural Networks in the context of acoustic scene classification. First, we show from the statistical analysis that some specific bands of mel-spectrograms carry discriminative information than other bands, which is specific to every soundscape. From the inferences taken by this, we propose SubSpectralNets in which we first design a new convolutional layer that splits the time-frequency features into sub-spectrograms, then merges the band-level features on a later stage for the global classification. The effectiveness of SubSpectralNet is demonstrated by a relative improvement of +14% accuracy over the DCASE 2018 baseline model. |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | No impact recorded as of yet. |
URL | https://github.com/ssrp/SubSpectralNet |
Title | Towards joint sound scene and polyphonic sound event recognition |
Description | Code related to INTERSPEECH 2019 paper for joint sound scene and polyphonic sound event recognition. Associated paper: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2169.pdf |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | None as of yet. |
URL | https://github.com/drylbear/jointASCandSED |