Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2605
|View full text |Cite
|
Sign up to set email alerts
|

Phase-Aware Music Super-Resolution Using Generative Adversarial Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…Although only a few studies have applied GAN models to bandwidth extension of music signals [10], [39], many recent works have applied them for speech [14], [13], [40]. Eskimez et al [40] proposed one of the earliest works using an adversarial approach for speech super-resolution.…”
Section: B Gans For Audio Bandwidth Extensionmentioning
confidence: 99%
See 2 more Smart Citations
“…Although only a few studies have applied GAN models to bandwidth extension of music signals [10], [39], many recent works have applied them for speech [14], [13], [40]. Eskimez et al [40] proposed one of the earliest works using an adversarial approach for speech super-resolution.…”
Section: B Gans For Audio Bandwidth Extensionmentioning
confidence: 99%
“…However, the Eskimez model had the limitation that it did not predict the phase information but just replicated it [40]. Other phase-aware works made an effort to incorporate the phase information into the training framework [10]. Instead, Kim et al [39], opted for working directly on raw audio, thus avoiding the aforementioned phase issues.…”
Section: B Gans For Audio Bandwidth Extensionmentioning
confidence: 99%
See 1 more Smart Citation
“…Learning-based methods perform better in this context because they can capture sophisticated domain-specific information. Convolutional networks [12,13] and generative adversarial networks (GAN) [14,15,16] have shown to greatly improve the quality of the synthesized high-resolution audio. However, GANs are known to be hard to train and produce outputs with significant artifacts.…”
Section: Introductionmentioning
confidence: 99%
“…However, the LR audio encoder only provides a time-domain perspective and it is challenging for LR audio encoder to exploit several characteristics of the signal, including cyclic behavior and long-range dependence, due to the limited receptive field and model capacity. Therefore, we further introduce the frequency domain encoder called STFT encoder, which first applies short-time Fourier transform (STFT) to the waveform and then encodes the phase and magnitude spectrogram, both of which are widely used in modern speech applications and methods [16,24]. The contributions of this work are summarized as follows:…”
Section: Introductionmentioning
confidence: 99%