2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288926
|View full text |Cite
|
Sign up to set email alerts
|

Towards automatic phonetic segmentation for TTS

Abstract: Phonetic segmentation is an important step in the development of a concatenative TTS voice. This paper introduces a segmentation process consisting of two phases. First, forced alignment is performed using an HMM-GMM model. The resulting segmentation is then locally refined using an SVM based boundary model. Both the models are derived from multi-speaker data using a speaker adaptive training procedure. Evaluation results are obtained on the TIMIT corpus and on a proprietary single-speaker TTS corpus.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 8 publications
0
2
0
1
Order By: Relevance
“…We proposed a simple iterative procedure that helps to improve the segmentation accuracy so that it is usable in a concatenative TTS system, for which precise segmentation is very important. 62 The procedure described has recently been utilized in our TTS system 61,63 when adding new voices in new languages.…”
Section: Research Objectivementioning
confidence: 99%
“…We proposed a simple iterative procedure that helps to improve the segmentation accuracy so that it is usable in a concatenative TTS system, for which precise segmentation is very important. 62 The procedure described has recently been utilized in our TTS system 61,63 when adding new voices in new languages.…”
Section: Research Objectivementioning
confidence: 99%
“…As the automatic phonetic segmentation accuracy has attracted researchers for many years, a number of HMM-based force alignment framework refinements were proposed (see, e.g., [1][2][3][4][5]). On the other hand, the origin of gross segmentation errors and a way to fix them has not been researched so much.…”
Section: Introductionmentioning
confidence: 99%
“…TIMIT-korpuse automaatsegmentimisel on saadud tulemuseks 79,5% ja 92,8% häälikupiiridest vastavalt 10 ms ja 20 ms sees (Hosom 2009). Üks hilisem uuring tutvustab automaatsegmentimise meetodit, mis on andnud sama korpuse puhul tulemuseks vastavalt 84,6% ja 95,4% (Rendel & Sorin et al 2012).…”
unclassified