AI for everyday sounds

Lead Research Organisation: Queen Mary University of London

Department Name: Sch of Electronic Eng & Computer Science

Abstract

Everyday sound covers all the sounds surrounding people in daily life, except speech and music. In the past decades, everyday sound perception is hard to be addressed due to the large variety of sound sources, highly different acoustic characteristics, and complex sound context. Nowadays the quantum leap of artificial intelligence (AI) technology makes it possible to endow machines with human auditory capacity.

As a core content of everyday sound perception, sound event detection (SED) is to recognise everyday sounds by classifying types of sound events and detecting their corresponding time boundaries. The goal of this PhD project is to develop novel technologies in SED and to extend them to several everyday-sound tasks, including audio tagging, sound separation, and audio captioning.

This project will firstly focus on sound event detection in terms of data and models. Specifically, transfer learning and few-shot learning will be investigated to address challenges related to class imbalance and insufficiency of obtaining samples in each sound class. Multi-resolution networks will be designed to make full use of the information from input spectrograms. Besides, the proposed models along with recent developments in machine learning will be applied to other everyday-sound tasks, such as sound separation and localisation tasks, to verify their generalisation performance. These models can be also utilised as feature extractors in the down-streaming tasks to boost the overall performance. Throughout this project, comparison will be made with the state-of-the-art algorithms. Qualitative and quantitative results will be analysed to discover the potential advantages and shortcomings of the proposed methods.

This PhD project is aligned with the following EPSRC research areas: Music and Acoustic Technology, Artificial Intelligence Technologies, and Digital Signal Processing. All aforementioned areas are key components of the overarching EPSRC themes in ICT, Digital Economy, and Artificial Intelligence and Robotics. With a focus on audio technologies, this project addresses the EPSRC's Delivery Plan on enhancing future digital technologies for transforming society and delivering prosperity as well as on enabling adaptable solutions on the provision of reliable infrastructure for audio technologies.

Student:

Jinhua Liang

Period of Study:

Oct 21 - Mar 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2598125

Research Topic:

Unclassified

Organisations

Queen Mary University of London (Lead Research Organisation)

People	ORCID iD
Emmanouil Benetos (Primary Supervisor)	http://orcid.org/0000-0002-6820-6764
Jinhua Liang (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513106/1			01/10/2018	30/09/2023
2598125	Studentship	EP/R513106/1	01/10/2021	31/03/2025	Jinhua Liang
EP/T518086/1			01/10/2020	30/09/2025
2598125	Studentship	EP/T518086/1	01/10/2021	31/03/2025	Jinhua Liang

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects