2010
DOI: 10.1587/transinf.e93.d.595
|View full text |Cite
|
Sign up to set email alerts
|

A Covariance-Tying Technique for HMM-Based Speech Synthesis

Abstract: SUMMARYA technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
4
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 21 publications
1
4
0
Order By: Relevance
“…However, it may not be always appropriate that mean and variance parameters have the same tying structure. As an example, we confirmed the effectiveness of a technique for context clustering mean vectors while tying all variance matrices [4]. In this technique, the synthesized speech can be expected to improve by constructing different tying structures for both mean and variance parameters.…”
Section: Introductionsupporting
confidence: 61%
See 1 more Smart Citation
“…However, it may not be always appropriate that mean and variance parameters have the same tying structure. As an example, we confirmed the effectiveness of a technique for context clustering mean vectors while tying all variance matrices [4]. In this technique, the synthesized speech can be expected to improve by constructing different tying structures for both mean and variance parameters.…”
Section: Introductionsupporting
confidence: 61%
“…In[4], variance parameters are tied to one in all states of HMMs. In this paper, we assume that the technique with the enough big weight of the MDL criterion in the proposed technique is the conventional one.…”
mentioning
confidence: 99%
“…In this way, the final HMEM preserves the first empirical moments of multiple decision trees, and the second moments of just one decision tree. This assumption is a result of the fact that first-order moments seem to be more important than second-order moments [32,47]. The discussion of current section shows that the ML estimates of parameters defined for…”
Section: Decision Tree-based Context Clusteringmentioning
confidence: 98%
“…Due to the prime importance of mean parameters in HMM-based speech synthesis [47], we investigate the difference between mean values predicted by two systems. Figure 3A shows a three-dimensional contextual factor space (c 1 -c 2 -c 3 ) which is clustered by an additive structure.…”
Section: Me-based Modeling Vs Additive Modelingmentioning
confidence: 99%
“…It was demonstrated that HMM-based speech synthesis systems whose footprints were about 100 KBytes could synthesize intelligible speech by using vector quantization, fixed-point numbers instead of floating-point numbers, and pruned decision trees. In addition, several techniques that suit embedded devices have been proposed, e.g., memory-efficient, low-delay speech parameter generation algorithms [87], [88] and tying model parameters [89].…”
Section: E Small Footprintmentioning
confidence: 99%