Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric  F0 Model

Ardaillon, Luc; Chabot-Canet, Céline; Röebel, Axel

doi:10.21437/interspeech.2016-1317

Cited by 8 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 7.1.2 summarizes the languages, numbers of submitted songs, voice genders and participating labs. For a detailed description of each system, the reader is referred to [49]: the WBHSM concatenative synthesizer (UPF, Barcelona) [16], ISIS, the Ircam Singing Synthesizer (Paris) ) [52], the Seraphim system (A*STAR, Singapore) [53], the Bertsokantari system (UPV, Bilbao) [54], the ACAPELA singing synthesis system (Mons) [55], and Calliphony, an earlier implementation of C-Voks. For the sake of simplicity, the system is coined C-Voks.…”

Section: Participant To the Challenge And Test Methodologymentioning

confidence: 99%

Voks: Digital instruments for chironomic control of voice samples

Locqueville

d’Alessandro

Delalez³

et al. 2020

Speech Communication

View full text Add to dashboard Cite

This paper presents Voks, a new family of digital instruments that allow for real-time control and modification of pre-recorded voice signal samples. An instrument based on Voks is made of Voks itself, the synthesis software and a given set of chironomic (hand-driven) interfaces. Rhythm can be accurately controlled thanks to a new methodology, based on syllabic control points. Timing can also be controlled with other methods, including scrubbing and playback speed variation. Pitch, vocal effort, voice tension, apparent vocal tract size, voicing ratio, aperiodicity ratio of the voice samples can be modified thanks to a real-time high-quality vocoder. Different forms of chironomic control of the vocal parameters are proposed. Pitch is controlled by continuous hand motions using a stylus on a surface (C-Voks) or a theremin (T-Voks). Other interfaces can be used as well. Syllabic rhythm is controlled using a biphasic button. Scrubbing, playback speed and timbre related parameters can be controlled using the theremin, control surfaces or continuous controllers like faders. In addition to realistic imitation of speaking or singing voices, other playing modes yield new interesting sounds. Voks participated in comparative perceptual evaluation of singing synthesis systems. It has been demonstrated in a live musical settings, using different control interfaces. In addition to musical or poetic performances, applications of performative vocal synthesis to language learning and speech reeducation are foreseen.

show abstract

Section: Participant To the Challenge And Test Methodologymentioning

confidence: 99%

Voks: Digital instruments for chironomic control of voice samples

Locqueville

d’Alessandro

Delalez³

et al. 2020

Speech Communication

View full text Add to dashboard Cite

show abstract

“…Learning it implicitly makes sense for end-to-end text-to-speech application as it does not carry much information, but coherence with other parameters is important. In singing, the f 0 -curve is the parameter responsible for carrying the melody but it carries also musical style and emotion [12]. It is therefore important to model it explicitly, which can be achieved with, e. g., B-splines [13], to still be able to tweak it by hand to fit the needs of the particular application.…”

Section: Proposed Network Architecturementioning

confidence: 99%

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

Bous¹,

Röebel²

2019

2019 27th European Signal Processing Conference (EUSIPCO)

Self Cite

View full text Add to dashboard Cite

We conduct an investigation on various hyperparameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously proposed 1d-convolutions and that predicting multiple frames in an iterated fashion during training is superior over injecting noise to the input data. An experimental investigation whether learning to predict a probability distribution vs. single samples was performed but turned out to be inconclusive. A network architecture is proposed that incorporates the improvements which we found to be useful and we show in our experiments that this network produces better results than other stat-of-the-art methods.

show abstract

“…The results lead us to believe that both transition-sustain models and the multi-layer F0 model are able to generate F0 expressions resembling the original performance to an extent that makes comparison between the two methods difficult. The advantage of the proposed method lies in being fully data-driven, while the multi-layer F0 model requires handtuning and its automation is still under investigation [17]. Figure 5 shows an example of a F0 trajectory generated by the purposed method overlaid on the input score and the original F0.…”

Section: Normalized Difference Gradesmentioning

confidence: 99%

Modeling Singing F0 With Neural Network Driven Transition-Sustain Models

Hua¹

2018

Preprint

View full text Add to dashboard Cite

This study focuses on generating fundamental frequency (F0) curves of singing voice from musical scores stored in a midilike notation. Current statistical parametric approaches to singing F0 modeling meet difficulties in reproducing vibratos and the temporal details at note boundaries due to the oversmoothing tendency of statistical models. This paper presents a neural network based solution that models a pair of neighboring notes at a time (the transition model) and uses a separate network for generating vibratos (the sustain model). Predictions from the two models are combined by summation after proper enveloping to enforce continuity. In the training phase, mild misalignment between the scores and the target F0 is addressed by back-propagating the gradients to the networks' inputs. Subjective listening tests on the NITech singing database show that transition-sustain models are able to generate F0 trajectories close to the original performance.

show abstract

Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model

Cited by 8 publications

References 9 publications

Voks: Digital instruments for chironomic control of voice samples

Voks: Digital instruments for chironomic control of voice samples

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

Modeling Singing F0 With Neural Network Driven Transition-Sustain Models

Contact Info

Product

Resources

About