2023
DOI: 10.1016/j.dsp.2022.103781
|View full text |Cite
|
Sign up to set email alerts
|

ITÔN: End-to-end audio generation with Itô stochastic differential equations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 3 publications
0
4
0
Order By: Relevance
“…Pioneering work WaveGrad [7] Code / Project DiffWave [45] Code Efficient vocoder BDDM [48] Code InferGrad [9] WaveFit [43] Project Statistical improvement DDGM [70] PriorGrad [50] Project ItôWave [125] Project SpecGrad [44] End-to-end Pioneering work WaveGrad 2 [8] Code / Project CRASH [90] Project Efficient model FastDiff [26] Code / Project Further improvements DAG [79] Itôn [99] Project statistical parametric speech synthesis (SPSS) was a popular method [115,116,132,133,137] consisting of three stages. As shown in Figure 1 (a), the text input is first converted to linguistic features, then acoustic features, and to the waveform in the last stage.…”
Section: Overview Of the Text-to-speech Developmentmentioning
confidence: 99%
See 3 more Smart Citations
“…Pioneering work WaveGrad [7] Code / Project DiffWave [45] Code Efficient vocoder BDDM [48] Code InferGrad [9] WaveFit [43] Project Statistical improvement DDGM [70] PriorGrad [50] Project ItôWave [125] Project SpecGrad [44] End-to-end Pioneering work WaveGrad 2 [8] Code / Project CRASH [90] Project Efficient model FastDiff [26] Code / Project Further improvements DAG [79] Itôn [99] Project statistical parametric speech synthesis (SPSS) was a popular method [115,116,132,133,137] consisting of three stages. As shown in Figure 1 (a), the text input is first converted to linguistic features, then acoustic features, and to the waveform in the last stage.…”
Section: Overview Of the Text-to-speech Developmentmentioning
confidence: 99%
“…Model based on Itô SDE. Inspired by ItôWave [125], Itôn [99] proposes an end-to-end model for speech synthesis based on Itô SDE. Apart from the encoder-decoder architecture, Itôn [99] introduces a dual-denoiser structure for the generation of mel-spectrogram and waveform, respectively.…”
Section: End-to-end Frameworkmentioning
confidence: 99%
See 2 more Smart Citations