Speaker Diarization

Lead Research Organisation: Imperial College London
Department Name: Electrical and Electronic Engineering

Abstract

I am interested in both speech recognition and speaker identification (realtime in both cases). However, there have been substantial recent improvements in the quality of speech recognition through the use of neural networks, whereas performance in speaker identification has lagged behind. I have identified some gaps in the existing speaker diarization literature that could be used to improve speaker identification performance significantly. Some of the specific research questions I intend to investigate are:

- Performance of speaker diarization systems for single omnidirectional microphones.
- Whether speaker diarization performance can be improved significantly with more than one input signal (e.e.: microphone arrays and beamforming).
- Performance of realtime speaker diarization systems.
- Speaker identification systems that record speaker identities on to a database and recognise previous speakers.
- Methods of using separate speech recognition and speaker diarization systems to give optimal performance.

The aim of speaker diarization is to identify and distinguish different speakers who might be speaking at a given moment based on characteristics of their voice. There are several benefits of making good speaker diarization systems. For
example, they could be used to improve existing speech recognition systems, make transcripts more meaningful and searchable, or assist hearing impaired people with identifying different speakers on conference calls.

My approach will initially involve analysing an existing speaker diarization system called DiarTk, which was published by the Idiap Research Institute along with supporting theoretical papers. I intend to identify the strengths and weaknesses of that system, and compare results with some of the commercially avalable systems such as those produced by IBM Watson and Microsoft Cognition Services.

Speaker diarization systems involve several steps on incoming audio signals, including voice activity detection, feature extraction, segmentation, labeling and clustering. I intend to investigate different theories and models for the various steps to identify overall speaker diarization systems that work best in practice. I intend to investigate performance in different speaking environments, for example, speech on conference calls and realtime transcription, which have been largely overlooked by existing research, but which are essential for practical applications.

The research comes within the EPSRC theme of Digital Signal Processing and works in particular on the processing of acoustic signals in real-room acoustic scenarios. We expect the work eventually to lead to contributions of high value to meeting diarization and with relevance to the hearing impaired.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509486/1 01/10/2016 31/03/2022
2127883 Studentship EP/N509486/1 01/10/2018 30/06/2022 Simon McKnight
EP/R513052/1 01/10/2018 30/09/2023
2127883 Studentship EP/R513052/1 01/10/2018 30/06/2022 Simon McKnight