ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746153
|View full text |Cite
|
Sign up to set email alerts
|

ItôWave: Itô Stochastic Differential Equation is all You Need for Wave Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…Pioneering work WaveGrad [7] Code / Project DiffWave [45] Code Efficient vocoder BDDM [48] Code InferGrad [9] WaveFit [43] Project Statistical improvement DDGM [70] PriorGrad [50] Project ItôWave [125] Project SpecGrad [44] End-to-end Pioneering work WaveGrad 2 [8] Code / Project CRASH [90] Project Efficient model FastDiff [26] Code / Project Further improvements DAG [79] Itôn [99] Project statistical parametric speech synthesis (SPSS) was a popular method [115,116,132,133,137] consisting of three stages. As shown in Figure 1 (a), the text input is first converted to linguistic features, then acoustic features, and to the waveform in the last stage.…”
Section: Overview Of the Text-to-speech Developmentmentioning
confidence: 99%
See 3 more Smart Citations
“…Pioneering work WaveGrad [7] Code / Project DiffWave [45] Code Efficient vocoder BDDM [48] Code InferGrad [9] WaveFit [43] Project Statistical improvement DDGM [70] PriorGrad [50] Project ItôWave [125] Project SpecGrad [44] End-to-end Pioneering work WaveGrad 2 [8] Code / Project CRASH [90] Project Efficient model FastDiff [26] Code / Project Further improvements DAG [79] Itôn [99] Project statistical parametric speech synthesis (SPSS) was a popular method [115,116,132,133,137] consisting of three stages. As shown in Figure 1 (a), the text input is first converted to linguistic features, then acoustic features, and to the waveform in the last stage.…”
Section: Overview Of the Text-to-speech Developmentmentioning
confidence: 99%
“…Other improvements. ItôWave [125] is the first to propose a vocoder based on linear Itô SDE. Based on Melspectrogram, ItôWave [125] achieves higher MOS with 95% confidence than WaveGrad [7] and DiffWave [45].…”
Section: Improvement From Statistical Perspectivementioning
confidence: 99%
See 2 more Smart Citations