Bandwidth Extension of Musical Audio Signals With No Side Information Using Dilated Convolutional Neural Networks

Lagrange, Mathieu; Gontier, Félix

doi:10.1109/icassp40776.2020.9054194

Cited by 11 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…processing [8], [9], [10], whereas most of these studies focus on processing speech [11], [12], [13], [14], [15]. Although music and speech share the same domain of acoustic signals, the two are fundamentally different.…”

Section: Bandwidthextendedmentioning

confidence: 99%

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

Moliner

Välimäki

2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Audio bandwidth extension aims to expand the spectrum of bandlimited audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes a method for the bandwidth extension of historical music using generative adversarial networks (BEHM-GAN) as a practical solution to this problem. The proposed method works with the complex spectrogram representation of audio and, thanks to a dedicated regularization strategy, can effectively extend the bandwidth of out-of-distribution real historical recordings. The BEHM-GAN is designed to be applied as a second step after denoising the recording to suppress any additive disturbances, such as clicks and background noise. We train and evaluate the method using solo piano classical music. The proposed method outperforms the compared baselines in both objective and subjective experiments. The results of a formal blind listening test show that BEHM-GAN significantly increases the perceptual sound quality in early-20th-century gramophone recordings. For several items, there is a substantial improvement in the mean opinion score after enhancing historical recordings with the proposed bandwidthextension algorithm. This study represents a relevant step toward data-driven music restoration in real-world scenarios.

show abstract

Section: Bandwidthextendedmentioning

confidence: 99%

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

Moliner

Välimäki

2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…BWE is alternatively referred to as audio re-sampling or sample-rate conversion in the field of Digital Signal Processing (DSP), or as audio super-resolution in the Machine Learning (ML) literature. Methods for BWE have been extensively studied in areas like audio streaming and restoration, mainly for legacy speech telephony communication systems [13,16,17,27] or, less commonly, for degraded musical material [19,20].…”

Section: Bandwidth Extensionmentioning

confidence: 99%

“…For example, Convolutional Neural Networks (CNNs) [12], WaveNet-like architectures [8,13], and UNets [14,15]. However, most of the works in this line of research tackle the enhancement of speech signals [7][8][9][10][12][13][14][15][16][17][18], and only a few publications exist for musical audio restoration [11,[19][20][21]. This focus on speech is understandable, given the wide range of speech enhancement techniques in telephony, automatic speech recognition, and hearing aids.…”

Section: Introductionmentioning

confidence: 99%

Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Lattner¹,

Nistal²

2021

Electronics

View full text Add to dashboard Cite

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.

show abstract

“…Bandwidth Extension (BE) is the task of reconstructing a high-bandwidth signal from its lowbandwidth version, and is usually demonstrated on speech [29,6,26,19,57] and music [30,54]. To perform BE using CAW, we first train it on a high-bandwidth short audio example of a specific speaker.…”

Section: Bandwidth Extensionmentioning

confidence: 99%

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

Greshler¹,

Shaham²,

Michaeli³

2021

Preprint

View full text Add to dashboard Cite

Models for audio generation are typically trained on hours of recordings. Here, we illustrate that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal. Specifically, we present a GAN-based generative model that can be trained on one short audio signal from any domain (e.g. speech, music, etc.) and does not require pre-training or any other form of external supervision. Once trained, our model can generate random samples of arbitrary duration that maintain semantic similarity to the training waveform, yet exhibit new compositions of its audio primitives. This enables a long line of interesting applications, including generating new jazz improvisations or new a-cappella rap variants based on a single short example, producing coherent modifications to famous songs (e.g. adding a new verse to a Beatles song based solely on the original recording), filling-in of missing parts (inpainting), extending the bandwidth of a speech signal (super-resolution), and enhancing old recordings without access to any clean training example. We show that in all cases, no more than 20 seconds of training audio commonly suffice for our model to achieve state-of-the-art results. This is despite its complete lack of prior knowledge about the nature of audio signals in general.Preprint. Under review.

show abstract

Bandwidth Extension of Musical Audio Signals With No Side Information Using Dilated Convolutional Neural Networks

Cited by 11 publications

References 13 publications

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

Contact Info

Product

Resources

About