2021
DOI: 10.48550/arxiv.2105.13871
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

Abstract: Singing voice conversion (SVC) is one promising technique which can enrich the way of human-computer interaction by endowing a computer the ability to produce high-fidelity and expressive singing voice. In this paper, we propose DiffSVC, an SVC system based on denoising diffusion probabilistic model. DiffSVC uses phonetic posteriorgrams (PPGs) as content features. A denoising module is trained in DiffSVC, which takes destroyed mel spectrogram produced by the diffusion/forward process and its corresponding step… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…The model is trained by optimizing the evidence lower bound (ELBO) during the diffusion process. Recently, the diffusion probabilistic models have been shown to provide outstanding performance in generative modeling for natural images [30], [31], and raw audio waveforms [32], [33]. As reported in [32], the DiffWave model, formed by the diffusion probabilistic model, can yield state-of-the-art performance on either conditional or unconditional waveform generation tasks with a small number of parameters.…”
Section: Introductionmentioning
confidence: 99%
“…The model is trained by optimizing the evidence lower bound (ELBO) during the diffusion process. Recently, the diffusion probabilistic models have been shown to provide outstanding performance in generative modeling for natural images [30], [31], and raw audio waveforms [32], [33]. As reported in [32], the DiffWave model, formed by the diffusion probabilistic model, can yield state-of-the-art performance on either conditional or unconditional waveform generation tasks with a small number of parameters.…”
Section: Introductionmentioning
confidence: 99%
“…Diffusion (score-based) models, which are originated from non-equilibrium statistical physics, have recently shown impressive successes on sample generations of a wide range of types, including images [10,21,38,50] , 3D point clouds [13,35] and audio generation [31,34], among others. In addition to concrete applications of various diffusion generative models, it is also desirable to analyze them in an appropriate and flexible framework, by which novel improvements can be further developed.…”
Section: Introductionmentioning
confidence: 99%