Music Demixing Challenge 2021

Mitsufuji, Yuki; Fabbro, Giorgio; Uhlich, Stefan; Stöter, Fabian-Robert; Défossez, Alexandre; Kim, Minseok; Choi, Woosung; Yu, Chin-Yun; Cheuk, Kin-Wai

doi:10.3389/frsip.2021.808395

Cited by 37 publications

(18 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, a recent trend has been to use both temporal and spectral domains, either through model blending, like KUIELAB-MDX-Net [17], or using a bi-U-Net structure with a shared backbone as Hybrid Demucs [2]. Hybrid Demucs was the first ranked architecture at the latest MDX MSS Competition [18], although it is now surpassed by Band-Split RNN. Using large datasets has been shown to be beneficial to the task of MSS.…”

Section: Related Workmentioning

confidence: 99%

Hybrid Transformers for Music Source Separation

Simon¹,

Massa²,

Défossez³

2022

Preprint

View full text Add to dashboard Cite

A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are sufficient. In other fields, attention based Transformers [1] have shown their ability to integrate information over long sequences. In this work, we introduce Hybrid Transformer Demucs (HT Demucs), an hybrid temporal/spectral bi-U-Net based on Hybrid Demucs [2], where the innermost layers are replaced by a cross-domain Transformer Encoder, using self-attention within one domain, and cross-attention across domains. While it performs poorly when trained only on MUSDB [3], we show that it outperforms Hybrid Demucs (trained on the same data) by 0.45 dB of SDR when using 800 extra training songs. Using sparse attention kernels to extend its receptive field, and per source fine-tuning, we achieve state-of-the-art results on MUSDB with extra training data, with 9.20 dB of SDR.

show abstract

Section: Related Workmentioning

confidence: 99%

Hybrid Transformers for Music Source Separation

Simon¹,

Massa²,

Défossez³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Despite the model not reaching the performance of today's source separation architectures [6], we included it in the evaluation due to its weak architectural priors.…”

Section: Wave-u-netmentioning

confidence: 99%

“…unprocessed) source tracks are required. Given the recent advances in automatic mixing [4,5] and music source separation [6], a system could be developed that facilitates the adjustment of a stereo mixture to the user's taste and preferences similar to [7]. However, today's source separation systems are commonly trained on data that is based on music stems (e.g., MUSDB18 [8]).…”

Section: Introductionmentioning

confidence: 99%

Removing Distortion Effects in Music Using Deep Neural Networks

Imort¹,

Fabbro²,

Ramírez³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Audio effects are an essential element in the context of music production, and therefore, modeling analog audio effects has been extensively researched for decades using systemidentification methods, circuit simulation, and recently, deep learning. However, only few works tackled the reconstruction of signals that were processed using an audio effect unit. Given the recent advances in music source separation and automatic mixing, the removal of audio effects could facilitate an automatic remixing system. This paper focuses on removing distortion and clipping applied to guitar tracks for music production while presenting a comparative investigation of different deep neural network (DNN) architectures on this task. We achieve exceptionally good results in distortion removal using DNNs for effects that superimpose the clean signal to the distorted signal, while the task is more challenging if the clean signal is not superimposed. Nevertheless, in the latter case, the neural models under evaluation surpass one state-of-the-art declipping system in terms of source-to-distortion ratio, leading to better quality and faster inference.

show abstract

“…The W and S indicates the waveform domain and spectrogram domain respectively. In the bottom half of Table 2, we also listed the results of the top-ranked hybrid domain systems in the Music Demixing (MDX) challenge at ISMIR 2021 [28], namely, KUIELab-MDX-Net [29] and Hybrid Demucs [30]. W + S indicates the method is working on hybrid domain.…”

Section: Comparison With the Existing Systemsmentioning

confidence: 99%

Multi-Scale Temporal-Frequency Attention for Music Source Separation

Chen

Zheng

Zhang

et al. 2022

2022 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

In recent years, deep neural networks (DNNs) based approaches have achieved the start-of-the-art performance for music source separation (MSS). Although previous methods have addressed the large receptive field modeling using various methods, the temporal and frequency correlations of the music spectrogram with repeated patterns have not been explicitly explored for the MSS task. In this paper, a temporalfrequency attention module is proposed to model the spectrogram correlations along both temporal and frequency dimensions. Moreover, a multi-scale attention is proposed to effectively capture the correlations for music signal. The experimental results on MUSDB18 dataset show that the proposed method outperforms the existing state-of-the-art systems with 9.51 dB signal-to-distortion ratio (SDR) on separating the vocal stems, which is the primary practical application of MSS.

show abstract

Music Demixing Challenge 2021

Cited by 37 publications

References 20 publications

Hybrid Transformers for Music Source Separation

Hybrid Transformers for Music Source Separation

Removing Distortion Effects in Music Using Deep Neural Networks

Multi-Scale Temporal-Frequency Attention for Music Source Separation

Contact Info

Product

Resources

About