Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1453
|View full text |Cite
|
Sign up to set email alerts
|

A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice Transformation

Abstract: Improving expressiveness in singing voice synthesis systems requires to perform realistic timbre transformations, e.g. for varying voice intensity. In order to sing louder, singers tend to open their mouth more widely, which changes the vocal tract's shape and resonances. This study shows, by means of signal analysis and simulations, that the main effect of mouth opening is an increase of the 1 st formant's frequency (F1) and a decrease of its bandwidth (BW1). From these observations, we then propose a rule fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 17 publications
(26 reference statements)
0
2
0
1
Order By: Relevance
“…Sebaliknya, penyanyi pop dapat menyanyikan nada yang bisa didengar dengan bantuan mikrofon. Faktor lain yang perlu dipertimbangkan adalah penggunaan berbagai bentuk produksi vocal (Ardaillon, 2017).…”
Section: Ambitus Suara (Vocal Range)unclassified
“…Sebaliknya, penyanyi pop dapat menyanyikan nada yang bisa didengar dengan bantuan mikrofon. Faktor lain yang perlu dipertimbangkan adalah penggunaan berbagai bentuk produksi vocal (Ardaillon, 2017).…”
Section: Ambitus Suara (Vocal Range)unclassified
“…We will use three different training datasets: a pure singing voice dataset si, a pure speech dataset sp, and a combined dataset vo consisting of both singing and speech. We use the same singing voice dataset as in [6] and [10], which consists of the datasets from [21,22,23,24,25,26,27,13,28,29]. The speech dataset consists of VCTK [30], Att-HACK [31], and further internal datasets containing expressive speech, amounting to about 70 h of speech in total.…”
Section: Training Datamentioning
confidence: 99%
“…During the age of classical parametric vocoders, the lower level voice attributes, like the pitch or voice level, could be transformed by adjusting the vocoder parameters according to the desired change of the voice attributes using predetermined heuristics [8,9,7]. To create realistic transformations, however, even changing the pitch is not as straightforward as simply changing the f0 parameter [11,12,13]: Because parameters of real voice signals are not independent, a change in one parameter needs to be accompanied by a change in the other parameters.…”
Section: Introductionmentioning
confidence: 99%