A Neural Parametric Singing Synthesizer

Blaauw, Merlijn; Bonada, Jordi

doi:10.21437/interspeech.2017-1420

Cited by 38 publications

(44 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This model's ability to accurately generate raw speech waveform sample-by-sample, clearly shows that oversmoothing is not an issue. Recently, we presented a model for singing synthesis based on the WaveNet model [6], with an important difference being that we model vocoder features rather than raw waveform. While a vocoder unavoidably introduces some degradation in sound quality, we consider the degradation introduced by current models to still be the dominant factor.…”

Section: Introductionmentioning

confidence: 99%

A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs

2017

Self Cite

View full text Add to dashboard Cite

Abstract:We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly reduces training and generation times. Nonetheless, compared to modeling waveform directly, ways of effectively handling higher-dimensional outputs, multiple feature streams and regularization become more important with our approach. In this work, we extend our proposed system to include additional components for predicting F0 and phonetic timings from a musical score with lyrics. These expression-related features are learned together with timbrical features from a single set of natural songs. We compare our method to existing statistical parametric, concatenative, and neural network-based approaches using quantitative metrics as well as listening tests.

show abstract

Section: Introductionmentioning

confidence: 99%

A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs

2017

Self Cite

View full text Add to dashboard Cite

show abstract

“…In recent years, several kinds of DNN-based singing voice synthesis systems [4,17,18,19,20] have been proposed. In the training part of the basic system [4], parameters for spectrum (e.g., melcepstral coefficients), excitation, and vibrato are extracted from a singing voice database as acoustic features.…”

Section: Dnn-based Singing Voice Synthesismentioning

confidence: 99%

Fast and High-Quality Singing Voice Synthesis System Based on Convolutional Neural Networks

Nakamura¹,

Takaki²,

Hashimoto³

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs. An acoustic feature sequence is generated for each segment that consists of long-term frames, and a natural trajectory is obtained without the parameter generation algorithm. Furthermore, a computational complexity reduction technique, which drives the DNNs in different time units depending on type of musical score features, is proposed. Experimental results show that the proposed method can synthesize natural sounding singing voices much faster than the conventional method.

show abstract

“…This allows us to model temporal dependencies between features within that block. This temporal dependence is modelled via autoregression in the Neural Parametric Singing Synthesizer (NPSS) [2] model, which we use as a baseline in our study.…”

Section: Related Workmentioning

confidence: 99%

“…This is ideal for the singing voice as the pitch range of the voice while singing is much higher than that while speaking normally. Modelling the timbre independently of the pitch has been shown to be an effective methodology [2]. We note that the use of a vocoder for direct synthesis can lead to a degradation of sound quality, but this degradation can be mitigated by the use of a WaveNet vocoder trained to synthesize the waveform from the parametric vocoder features.…”

Section: Introductionmentioning

confidence: 99%

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

Chandna

Blaauw

Bonada

et al. 2019

2019 27th European Signal Processing Conference (EUSIPCO)

Self Cite

View full text Add to dashboard Cite

We present a deep neural network based singing voice synthesizer, inspired by the Deep Convolutions Generative Adversarial Networks (DCGAN) architecture and optimized using the Wasserstein-GAN algorithm. We use vocoder parameters for acoustic modelling, to separate the influence of pitch and timbre. This facilitates the modelling of the large variability of pitch in the singing voice. Our network takes a block of consecutive frame-wise linguistic and fundamental frequency features, along with global singer identity as input and outputs vocoder features, corresponding to the block of features. This block-wise approach, along with the training methodology allows us to model temporal dependencies within the features of the input block. For inference, sequential blocks are concatenated using an overlap-add procedure. We show that the performance of our model is competitive with regards to the state-of-the-art and the original sample using objective metrics and a subjective listening test. We also present examples of the synthesis on a supplementary website and the source code via GitHub.

show abstract

A Neural Parametric Singing Synthesizer

Cited by 38 publications

References 10 publications

A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs

A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs

Fast and High-Quality Singing Voice Synthesis System Based on Convolutional Neural Networks

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

Contact Info

Product

Resources

About