2010 IEEE International Conference on Acoustics, Speech and Signal Processing 2010
DOI: 10.1109/icassp.2010.5495126
|View full text |Cite
|
Sign up to set email alerts
|

VTLN adaptation for statistical speech synthesis

Abstract: The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized speech. The application of vocal tract length normalization (VTLN) for synthesis is explored in this paper. VTLN based adaptation requires estimation of a single warping factor, which can be accurately estimated from v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 17 publications
(20 citation statements)
references
References 12 publications
0
20
0
Order By: Relevance
“…The characteristics estimated by VTLN when propagated to the nodes of the tree structure are expected to improve the speaker specific transform estimation for CSMAPLR. More specifically, VTLN has been shown to be closer to the average voice, and hence better in naturalness [3] and CSMAPLR is known to bring in better speaker similarity when very little adaptation data is available. Apriori, combination of these two is expected to give improved performance with respect to naturalness and speaker similarity.…”
Section: Using Vtln As Csmaplr Priormentioning
confidence: 99%
See 3 more Smart Citations
“…The characteristics estimated by VTLN when propagated to the nodes of the tree structure are expected to improve the speaker specific transform estimation for CSMAPLR. More specifically, VTLN has been shown to be closer to the average voice, and hence better in naturalness [3] and CSMAPLR is known to bring in better speaker similarity when very little adaptation data is available. Apriori, combination of these two is expected to give improved performance with respect to naturalness and speaker similarity.…”
Section: Using Vtln As Csmaplr Priormentioning
confidence: 99%
“…The main advantage of using the EM algorithm over, say, a grid search is that the resulting warping factor estimation has finer granularity of α values, and efficient implementation in time and space. The EM algorithm can be embedded into HMM training utilizing the same sufficient statistics as CMLLR [3,5,14], which transforms the spectral features as follows…”
Section: Vtln In Statistical Parametric Speech Synthesismentioning
confidence: 99%
See 2 more Smart Citations
“…VTLN has been applied to HMM based speech synthesis [8] and has been shown to improve the synthetic speech quality when combined with adaptation based approaches [9,10]. Using VTLN as a linear transformation eliminates the need to store warped features.…”
Section: Vtln Adaptation For Speech Synthesismentioning
confidence: 99%