MAKING IT EASIER TO BUILD A TTS FRONTEND

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

A Sequence-to-Sequence (Seq2Seq) frontend distilled from a pipeline-based frontend is in-evitably upper-limited by the performance of the pipeline. In the past year, my research focused on counterbalancing this limitation via incorporating other training sources to augment the Seq2Seq frontend. A Forced-Alignment (FA) augmentation method was proposed, leveraging audio data as the training source instead of annotated data, which is expensive and laborious to collect. The experimental results showed its effectiveness in learning the pronunciation of out-of-dictionary words and homographs from audio. In the future, my research will focus on two aspects: (1) Ex- tend the FA augmentation method to tackle the lack of diversity and improve the acoustic model within the method. Specifically, a NN-based rescoring strategy and shallow fusion will be inves- tigated; (2) Handle the Text Normalization (TN), which is not involved in the current frontend modelling. Particularly, two challenging problems should be handled, the lack of paired training data and unrecoverable semantic errors. Specifically, a more efficient neural architecture will be investigated to reduce unrecoverable semantic errors and TN/ITN duplex modelling coupled with semi-supervised/unsupervised learning techniques will be investigated to tackle the lack of paired training data.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S022481/1 31/03/2019 29/09/2027
2425922 Studentship EP/S022481/1 31/08/2020 30/08/2024 Siqi Sun