Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1317
|View full text |Cite
|
Sign up to set email alerts
|

Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2016
2016
2025
2025

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 9 publications
0
6
0
Order By: Relevance
“…Table 7.1.2 summarizes the languages, numbers of submitted songs, voice genders and participating labs. For a detailed description of each system, the reader is referred to [49]: the WBHSM concatenative synthesizer (UPF, Barcelona) [16], ISIS, the Ircam Singing Synthesizer (Paris) ) [52], the Seraphim system (A*STAR, Singapore) [53], the Bertsokantari system (UPV, Bilbao) [54], the ACAPELA singing synthesis system (Mons) [55], and Calliphony, an earlier implementation of C-Voks. For the sake of simplicity, the system is coined C-Voks.…”
Section: Participant To the Challenge And Test Methodologymentioning
confidence: 99%
“…Table 7.1.2 summarizes the languages, numbers of submitted songs, voice genders and participating labs. For a detailed description of each system, the reader is referred to [49]: the WBHSM concatenative synthesizer (UPF, Barcelona) [16], ISIS, the Ircam Singing Synthesizer (Paris) ) [52], the Seraphim system (A*STAR, Singapore) [53], the Bertsokantari system (UPV, Bilbao) [54], the ACAPELA singing synthesis system (Mons) [55], and Calliphony, an earlier implementation of C-Voks. For the sake of simplicity, the system is coined C-Voks.…”
Section: Participant To the Challenge And Test Methodologymentioning
confidence: 99%
“…Learning it implicitly makes sense for end-to-end text-to-speech application as it does not carry much information, but coherence with other parameters is important. In singing, the f 0 -curve is the parameter responsible for carrying the melody but it carries also musical style and emotion [12]. It is therefore important to model it explicitly, which can be achieved with, e. g., B-splines [13], to still be able to tweak it by hand to fit the needs of the particular application.…”
Section: Proposed Network Architecturementioning
confidence: 99%
“…The results lead us to believe that both transition-sustain models and the multi-layer F0 model are able to generate F0 expressions resembling the original performance to an extent that makes comparison between the two methods difficult. The advantage of the proposed method lies in being fully data-driven, while the multi-layer F0 model requires handtuning and its automation is still under investigation [17]. Figure 5 shows an example of a F0 trajectory generated by the purposed method overlaid on the input score and the original F0.…”
Section: Normalized Difference Gradesmentioning
confidence: 99%