2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
DOI: 10.1109/icassp.2006.1660155
|View full text |Cite
|
Sign up to set email alerts
|

Sub-Phonetic Modeling For Capturing Pronunciation Variations For Conversational Speech Synthesis

Abstract: In this paper we address the issue of pronunciation modeling for conversational speech synthesis. We experiment with two different HMM topologies (fully connected state model and forward connected state model) for sub-phonetic modeling to capture the deletion and insertion of sub-phonetic states during speech production process. We show that the experimented HMM topologies have higher log likelihood than the traditional 5-state sequential model. We also study the first and second mentions of content words and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 51 publications
(28 citation statements)
references
References 3 publications
(3 reference statements)
0
28
0
Order By: Relevance
“…These are compared against the baseline phone based unit selection voice. Segmentation of the database in terms of graphemes and trigraphemes is automatically done using the EHMM labeller [8] in FESTVOX. For the grapheme based system, dictionary representation of each word is in terms of its graphemes, e.g.…”
Section: Trigrapheme Based Speech Synthesis Systemmentioning
confidence: 99%
“…These are compared against the baseline phone based unit selection voice. Segmentation of the database in terms of graphemes and trigraphemes is automatically done using the EHMM labeller [8] in FESTVOX. For the grapheme based system, dictionary representation of each word is in terms of its graphemes, e.g.…”
Section: Trigrapheme Based Speech Synthesis Systemmentioning
confidence: 99%
“…We force align the speech with its transcription using an HMM tool [7]. This tool allows us to bootstrap alignments just from speech and its transcripts for any language.…”
Section: Acoustically Derived Phrase Breaksmentioning
confidence: 99%
“…While early work has mostly concentrated on using phonological rules extracted from data to create alternative pronunciations [17,8], most recent techniques are machine learning approaches. Notably, decision trees [7,18], random forests [6], neural networks [5,10], hidden Markov models [16], and CRFs [10] have been investigated. In [18], decision trees and statistical contextual rules are even combined.…”
Section: Introductionmentioning
confidence: 99%