2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854063
|View full text |Cite
|
Sign up to set email alerts
|

Parametric representation for singing voice synthesis: A comparative evaluation

Abstract: Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is wellknown in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and Glo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 30 publications
0
6
0
Order By: Relevance
“…Furthermore, the amount of degradation may differ markedly between speakers (voices), and one vocoder or another may perform better or worse for any particular speakers; cf. (Babacan, Drugman, Raitio, Erro & Dutoit 2014) for singing. These quality variations suggest a notion of "vocodability", the consequence of which is a (perhaps undesirable) bias when selecting a speaker for a TTS corpus, towards a voice that suffers minimal degradation at the hands of the vocoder.…”
Section: Vocodingmentioning
confidence: 99%
“…Furthermore, the amount of degradation may differ markedly between speakers (voices), and one vocoder or another may perform better or worse for any particular speakers; cf. (Babacan, Drugman, Raitio, Erro & Dutoit 2014) for singing. These quality variations suggest a notion of "vocodability", the consequence of which is a (perhaps undesirable) bias when selecting a speaker for a TTS corpus, towards a voice that suffers minimal degradation at the hands of the vocoder.…”
Section: Vocodingmentioning
confidence: 99%
“…The model training follows the speaker dependent singing synthesis system using STRAIGHT [17] for feature extraction and synthesis. [18] analysed the analysis/re-synthesis quality of different parametric representations for singing speech, where STRAIGHT achieved a good performance. It would also be interesting to evaluate statistical parametric opera singing synthesis with different vocoders, which is however out of the scope of this paper.…”
Section: Acoustic Modellingmentioning
confidence: 99%
“…F 0 was estimated using the Summation of the Residual Harmoncis (SRH) algorithm [16]. TE is here required as standard cepstral analysis was reported in [12] to be inappropriated for singing voice analysis. TE was estimated using the COVAREP toolkit [17].…”
Section: A Experimental Protocolmentioning
confidence: 99%
“…In [11], the use of a dynamic MVF was even found to be slightly preferred over the multiband approach. While current methods of MVF estimation seem to be efficient in speech, some issues were reported in [12] for the synthesis of singing voices. For highpitched voices, MVF was observed to be underestimated which led to an excessive amount of noise after synthesis.…”
Section: Introductionmentioning
confidence: 99%