2018
DOI: 10.48550/arxiv.1811.02155
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FloWaveNet : A Generative Flow for Raw Audio

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
48
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(48 citation statements)
references
References 0 publications
0
48
0
Order By: Relevance
“…Although non-linear quantization processes such as µ-law received much attention the last years, the majority of the existing papers use a normalized high resolution signal as input [14]. Finally, other applications include linear quantization of the input waveform [15] [16] and different designs for most and less significant bits [17].…”
Section: A Waveform -Raw Audiomentioning
confidence: 99%
See 3 more Smart Citations
“…Although non-linear quantization processes such as µ-law received much attention the last years, the majority of the existing papers use a normalized high resolution signal as input [14]. Finally, other applications include linear quantization of the input waveform [15] [16] and different designs for most and less significant bits [17].…”
Section: A Waveform -Raw Audiomentioning
confidence: 99%
“…At last, other variations of conditioning have been introduced as well. Kim et al [14] adjusted conditioning through the loss function. They estimated an auxiliary probability density using mel-spectrograms for local conditioning.…”
Section: Othermentioning
confidence: 99%
See 2 more Smart Citations
“…The development of neural end-to-end text-to-speech (TTS) models [1,2,3,4,5,6] has greatly promoted speech synthesis. Generally, with a well-trained neural acoustic model [2,5,6,7] and a neural vocoder [8,9,10,11], or alternatively using fully end-to-end models [12,13,14] which directly construct wave signals from text input, it is able to synthesize high-quality neutral speech. Recently, much attention has been attracted to synthesizing expressive speech, such as stylized speech [15,16], emotional speech [17,18,19,20,21,22], and also singing voice [23,24].…”
Section: Introductionmentioning
confidence: 99%