Challenges with Emotion in Speech Synthesis

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

In emotional speech synthesis, the aim is to add human emotion into synthesised speech in order to mimic the emotion audible in natural speech. This is desirable in a number of applications where more natural, human-like synthesised speech is desired. For example, in audiobooks, which would be faster to create with a Text-To-Speech (TTS) model, but where a lack of emotion would make for a suboptimal listening experience. Another example application is in personal assistive devices, which are used to help people speak with a Text-To- Speech model. Learning and mimicking human emotion in synthesised speech is a difficult objective due to a number of reasons, such as:
High-Quality data is hard to come by
High-Quality labels are hard to come by
Emotion is very imbalanced in existing datasets
Evaluation is not standardised
Finally, one thing that must always be kept in mind is that emotion is both a very universal and a very personal experience.

These difficulties drive the focus of this research, which is on improving available datasets and evaluation methods for emotional speech synthesis.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S022481/1 31/03/2019 29/09/2027
2268062 Studentship EP/S022481/1 31/08/2019 30/11/2023 Emelie Van De Vreken