2020
DOI: 10.48550/arxiv.2008.02027
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Denoise Historical Music

Abstract: We propose an audio-to-audio neural network model that learns to denoise old music recordings. Our model internally converts its input into a time-frequency representation by means of a short-time Fourier transform (STFT), and processes the resulting complex spectrogram using a convolutional neural network. The network is trained with both reconstruction and adversarial objectives on a synthetic noisy music dataset, which is created by mixing clean music with real noise samples extracted from quiet segments of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…speech [163]. The music inpainting model of [164], uses U-Net as the generator of the Generative Adversarial Network (GAN); another deep neural network, and splits the complex STFT spectrogram in two channels (real and imaginary) to be given at input of the model, which inpaints successfully the musical records, with pauses as long as 100ms. The inpainting model of [165] uses deep complex U-Net and complex-valued STFT spectrograms as input to estimate the complex-valued Ratio Mask (cRM), to restore the gaps due to hiss, clicks, thumps, and other common additive disturbances from old analog gramophone discs.…”
Section: B Musicmentioning
confidence: 99%
“…speech [163]. The music inpainting model of [164], uses U-Net as the generator of the Generative Adversarial Network (GAN); another deep neural network, and splits the complex STFT spectrogram in two channels (real and imaginary) to be given at input of the model, which inpaints successfully the musical records, with pauses as long as 100ms. The inpainting model of [165] uses deep complex U-Net and complex-valued STFT spectrograms as input to estimate the complex-valued Ratio Mask (cRM), to restore the gaps due to hiss, clicks, thumps, and other common additive disturbances from old analog gramophone discs.…”
Section: B Musicmentioning
confidence: 99%
“…We collect a set of 1000 room reverberation IRs by running a room simulator based on the image-source method on rooms of diverse geometry. Finally, we extract 10000 noise segments from the Speech Commands dataset [19], by applying the same extraction method as in [10], i.e., by searching for short low-energy segments of 100 ms length and replicating them (randomizing the phase) with overlap-and-add to obtain samples of 1 second.…”
Section: Speech Enhancement Networkmentioning
confidence: 99%