2020
DOI: 10.48550/arxiv.2010.14356
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Upsampling artifacts in neural audio synthesis

Abstract: A number of recent advances in audio synthesis rely on neural upsamplers, which can introduce undesired artifacts. In computer vision, upsampling artifacts have been studied and are known as checkerboard artifacts (due to their characteristic visual pattern). However, their effect has been overlooked so far in audio processing. Here, we address this gap by studying this problem from the audio signal processing perspective. We first show that the main sources of upsampling artifacts are: (i) the tonal and filte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…A transposed convolution operation forms the same connectivity as a direct convolution but in the backward direction, which requires upsampling the input into an output of larger dimensions. Transposed convolutions are commonly used in CNN training and in emerging CNN workloads [40,41,46,47,49,64,65,[67][68][69][70][71][72]. Figure 1 .…”
Section: Transposed Convolutionmentioning
confidence: 99%
See 3 more Smart Citations
“…A transposed convolution operation forms the same connectivity as a direct convolution but in the backward direction, which requires upsampling the input into an output of larger dimensions. Transposed convolutions are commonly used in CNN training and in emerging CNN workloads [40,41,46,47,49,64,65,[67][68][69][70][71][72]. Figure 1 .…”
Section: Transposed Convolutionmentioning
confidence: 99%
“…Dilated convolutions are commonly used in CNN training and in emerging CNN workloads [40,41,46,47,49,64,65,[73][74][75][76][77]. Figure 1 3 shows a dilated convolution example that calculates the filter gradients (δW xy ) with dilation rate = 2 (i.e., stride 2) in the backward propagation pass of CNN training.…”
Section: Dilated Convolutionmentioning
confidence: 99%
See 2 more Smart Citations
“…Mel spectrogram upsampling is done by alternating nearestneighbor upsampling and 1D convolution layer with the kernel size of 3. We use nearest-neighbor upsampling over transposed convolution as [24,25] report various advantages (e.g. less distortion in the frequency domain, less checkerboard artifacts, and better preservation of information from low resolution).…”
Section: Proposed Architecturementioning
confidence: 99%