2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947410
|View full text |Cite
|
Sign up to set email alerts
|

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 8 publications
0
12
0
Order By: Relevance
“…Whereas the original SBM method focuses on the log-amplitude spectrum and phase spectrum, we apply SBM to the log-amplitude spectrum and aperiodic component. Here, the aperiodic component is defined as the ratio between harmonic and non-harmonic components in each frequency band of a speech signal [30]. The value 0 means complete periodic and 1 is purely non-periodic.…”
Section: Sub-band Basis Spectrum Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Whereas the original SBM method focuses on the log-amplitude spectrum and phase spectrum, we apply SBM to the log-amplitude spectrum and aperiodic component. Here, the aperiodic component is defined as the ratio between harmonic and non-harmonic components in each frequency band of a speech signal [30]. The value 0 means complete periodic and 1 is purely non-periodic.…”
Section: Sub-band Basis Spectrum Modelmentioning
confidence: 99%
“…Thus, the full-band in this paper means 50-11025 Hz. Speech spectra were derived by 1024-point pitch synchronous Fourier transform using our in-house tools [30]. When the SBM parameters were extracted, the warping parameter α was set to 0.35 according to [25].…”
Section: Speaker-independent and Speaker-dependent Bandwidth Extensiomentioning
confidence: 99%
“…More recently, some work was done using continuous F 0 and it was shown that continuous F 0 improves the perceived naturalness of synthesis (Yu and Young, 2011;Latorre et al, 2011). This was further improved by hierarchical modelling using a continuous wavelet decomposition to separate the different levels of variation in F 0 (Suni et al, 2013).…”
Section: Introductionmentioning
confidence: 99%
“…To alleviate the perceived hoarseness, popular solutions include the use of continuous F0 contours either for modeling [14], [20], [21] or for synthesis [19]. In this paper, we directly tackle the problem of F0 estimation in glottalized regions of a speech signal.…”
Section: Introductionmentioning
confidence: 99%
“…Standard HMM-based TTS [2] uses multi-space distribution (MSD) to model and generate discontinuous F0 trajectories [18]. Faulty voicing decisions resulting from the F0 extraction phase will cause the deteriorately trained MSD-HMMs to synthesize voiced frames as unvoiced, resulting in hoarse speech, or to synthesize unvoiced frames as voiced, resulting in buzzy speech [19]. When listening to the output of our baseline HMM-based TTS system, we perceived that although the synthetic speech is highly intelligible, its overall quality is greatly degraded by the hoarseness frequently occurring at syllables bearing a glottalized tone.…”
Section: Introductionmentioning
confidence: 99%