Automatic target cost and database design for unit-selection speech synthesis

Lead Research Organisation: University of Edinburgh
Department Name: Centre for Speech Technology Research

Abstract

We propose to replace three components of a typical concatenativespeech synthesiser: the text selection algorithm (what to record forthe database), the target cost function (which units to select fromthe database) and the backoff strategy (what to do when the databasedoes not contain the desired unit).These components are currently designed independently using humanintuition. This is very hard, can only be done by experts, and meansthat each component is unlikely to be optimised with respect to theothers. We propose to base these three components on a singleunderlying model. The model will learn, from data, which speech unitsare perceptually interchangeable. This information will then be usedby the target cost function / backoff strategy, and when selecting thetext to be recorded. The proposed techniques will be implemented inthe Festival 2 speech synthesis system and evaluated using formallistening tests.We break down the research programme into three phases. In Phase 1, wewill gain a deeper understanding of current techniques. In Phase 2, wewill examine techniques for learning just the target cost/backoffstrategy, given an existing voice, then for learning thetext-selection algorithm for a given target cost/backoffstrategy. Finally, in Phase 3, we will devise a method for jointlylearning both together.

Publications

10 25 50
publication icon
Richmond, K (2007) Festival Multisyn Voices for the 2007 Blizzard Challenge in Proc. Blizzard Challenge Workshop (in Proc. SSW6)

publication icon
Strom, V And King, S (2008) Investigating Festival's target cost function using perceptual experiments in Proceedings Interspeech

publication icon
Karaiskos, V (2008) The Blizzard Challenge 2008 in The Blizzard Challenge 2008

publication icon
King, S (2009) The Blizzard Challenge 2009 in The Blizzard Challenge 2009

publication icon
M Aylett (2009) Speech Synthesis Without a Phone Inventory in Interspeech

 
Description Invited public lecture: A survey of speech technology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Discussions with the audience


Follow up emails from members of the audience
Year(s) Of Engagement Activity 2009
 
Description The future of Languages - more than just words 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A public lecture at the Public Library in Amsterdam, followed by a debate with an audience.

Interactions with the audience.
Year(s) Of Engagement Activity 2012
URL http://www.clubofamsterdam.com/event.asp?contentid=854