Automatic target cost and database design for unit-selection speech synthesis
Lead Research Organisation:
University of Edinburgh
Department Name: Centre for Speech Technology Research
Abstract
We propose to replace three components of a typical concatenativespeech synthesiser: the text selection algorithm (what to record forthe database), the target cost function (which units to select fromthe database) and the backoff strategy (what to do when the databasedoes not contain the desired unit).These components are currently designed independently using humanintuition. This is very hard, can only be done by experts, and meansthat each component is unlikely to be optimised with respect to theothers. We propose to base these three components on a singleunderlying model. The model will learn, from data, which speech unitsare perceptually interchangeable. This information will then be usedby the target cost function / backoff strategy, and when selecting thetext to be recorded. The proposed techniques will be implemented inthe Festival 2 speech synthesis system and evaluated using formallistening tests.We break down the research programme into three phases. In Phase 1, wewill gain a deeper understanding of current techniques. In Phase 2, wewill examine techniques for learning just the target cost/backoffstrategy, given an existing voice, then for learning thetext-selection algorithm for a given target cost/backoffstrategy. Finally, in Phase 3, we will devise a method for jointlylearning both together.
Organisations
Publications
Badino, L
(2008)
Including Pitch Accent Optionality in Unit Selection Text-to-Speech Synthesis
in Proc. Interspeech
Karaiskos, V
(2008)
The Blizzard Challenge 2008
in The Blizzard Challenge 2008
King, S
(2009)
The Blizzard Challenge 2009
in The Blizzard Challenge 2009
M Aylett
(2009)
Speech Synthesis Without a Phone Inventory
in Interspeech
Richmond, K
(2007)
Festival Multisyn Voices for the 2007 Blizzard Challenge
in Proc. Blizzard Challenge Workshop (in Proc. SSW6)
Strom V.
(2010)
A classifier-based target cost for unit selection speech synthesis trained on perceptual data
in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
Strom, V
(2010)
A classifier-based target cost for unit selection speech synthesis trained on perceptual data
in Proceedings of Interspeech
Strom, V And King, S
(2008)
Investigating Festival's target cost function using perceptual experiments
in Proceedings Interspeech
Description | Invited public lecture: A survey of speech technology |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Discussions with the audience Follow up emails from members of the audience |
Year(s) Of Engagement Activity | 2009 |
Description | The future of Languages - more than just words |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | A public lecture at the Public Library in Amsterdam, followed by a debate with an audience. Interactions with the audience. |
Year(s) Of Engagement Activity | 2012 |
URL | http://www.clubofamsterdam.com/event.asp?contentid=854 |