AI for everyday sounds

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

Everyday sound covers all the sounds surrounding people in daily life, except speech and music. In the past decades, everyday sound perception is hard to be addressed due to the large variety of sound sources, highly different acoustic characteristics, and complex sound context. Nowadays the quantum leap of artificial intelligence (AI) technology makes it possible to endow machines with human auditory capacity.

As a core content of everyday sound perception, sound event detection (SED) is to recognise everyday sounds by classifying types of sound events and detecting their corresponding time boundaries. The goal of this PhD project is to develop novel technologies in SED and to extend them to several everyday-sound tasks, including audio tagging, sound separation, and audio captioning.

This project will firstly focus on sound event detection in terms of data and models. Specifically, transfer learning and few-shot learning will be investigated to address challenges related to class imbalance and insufficiency of obtaining samples in each sound class. Multi-resolution networks will be designed to make full use of the information from input spectrograms. Besides, the proposed models along with recent developments in machine learning will be applied to other everyday-sound tasks, such as sound separation and localisation tasks, to verify their generalisation performance. These models can be also utilised as feature extractors in the down-streaming tasks to boost the overall performance. Throughout this project, comparison will be made with the state-of-the-art algorithms. Qualitative and quantitative results will be analysed to discover the potential advantages and shortcomings of the proposed methods.

This PhD project is aligned with the following EPSRC research areas: Music and Acoustic Technology, Artificial Intelligence Technologies, and Digital Signal Processing. All aforementioned areas are key components of the overarching EPSRC themes in ICT, Digital Economy, and Artificial Intelligence and Robotics. With a focus on audio technologies, this project addresses the EPSRC's Delivery Plan on enhancing future digital technologies for transforming society and delivering prosperity as well as on enabling adaptable solutions on the provision of reliable infrastructure for audio technologies.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513106/1 01/10/2018 30/09/2023
2598125 Studentship EP/R513106/1 01/10/2021 31/03/2025 Jinhua Liang
EP/T518086/1 01/10/2020 30/09/2025
2598125 Studentship EP/T518086/1 01/10/2021 31/03/2025 Jinhua Liang