2020
DOI: 10.48550/arxiv.2004.11012
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 10 publications
0
16
0
Order By: Relevance
“…M k can be calculated in closed form time [7]. 5 Audio samples are available via https://diffsinger.github.io.…”
Section: Diffusion Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…M k can be calculated in closed form time [7]. 5 Audio samples are available via https://diffsinger.github.io.…”
Section: Diffusion Modelmentioning
confidence: 99%
“…Finally, since the pipeline of SVS resembles that of text-to-speech (TTS) task, we make adjustments to DiffSinger for generalization. The contributions of this work can be summarized as follows 5 In this section, we introduce the theory of diffusion probabilistic model [7,31]. The full proof can be found in previous works [7,13,32].…”
Section: Introductionmentioning
confidence: 99%
“…ML-GAN in HiFiSinger helps supervise waveform reconstruction and achieves good results in single speaker singing voice synthesis, but its F0 embedding reduces model generalizability in multi-speaker singing data. ByteSing (Gu et al 2020) is a Chinese SVS system based on duration allocated Tacotronlike acoustic model and WaveRNN vocoder. The authors report that those systems can generate natural singing voices.…”
Section: Related Workmentioning
confidence: 99%
“…Most previous works focus on optimizing the acoustic model, but usually use speech vocoders for SVS (Gu et al 2020;Chen et al 2020). Some speech vocoders have been widely applied to SVS, such as WaveRNN in ByteSing (Gu et al 2020) and Parallel WaveGAN in HiFiSinger (Chen et al 2020). However, as an important component in SVS, the vocoder directly impacts the upper bound of generated audio quality.…”
Section: Introductionmentioning
confidence: 99%
“…Singing voice synthesis (SVS) aims to synthesize high-quality and expressive singing voices based on musical score information. Singing voice synthesis (SVS) systems [2,14,22] take music score and lyric information as input to generate singing voices, and these systems have been widely deployed in music softwares, music boxes, and so on. SVS systems could generate singing voices with comparable quality to reference songs, which attract widespread research interest.…”
Section: Introductionmentioning
confidence: 99%