Multi-channel U-Net for Music Source Separation

Kadandale, Venkatesh S.; Montesinos, Juan F.; Haro, Gloria

doi:10.48550/arxiv.2003.10414

Cited by 1 publication

(1 citation statement)

References 8 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, due to the high-computational cost, many of the current VAS studies perform downsampling in advance. For instance, the approach using M-U-Net [3] downsamples the audio to 10.88kHz before processing and Dense-Unet only works on 16kHz music in [4]. The downsampling process seriously affects the auditory quality to the separated vocal and accompaniment in practical applications.…”

Section: Introductionmentioning

confidence: 99%

Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

Liu¹,

Xie

et al. 2020

Interspeech 2020

View full text Add to dashboard Cite

This paper presents a new input format, channel-wise subband input (CWS), for convolutional neural networks (CNN) based music source separation (MSS) models in the frequency domain. We aim to address the major issues in CNN-based highresolution MSS model: high computational cost and weight sharing between distinctly different bands. Specifically, in this paper, we decompose the input mixture spectra into several bands and concatenate them channel-wise as the model input. The proposed approach enables effective weight sharing in each subband and introduces more flexibility between channels. For comparison purposes, we perform voice and accompaniment separation (VAS) on models with different scales, architectures, and CWS settings. Experiments show that the CWS input is beneficial in many aspects. We evaluate our method on musdb18hq test set, focusing on SDR, SIR and SAR metrics. Among all our experiments, CWS enables models to obtain 6.9% performance gain on the average metrics. With even a smaller number of parameters, less training data, and shorter training time, our MDenseNet with 8-bands CWS input still surpasses the original MMDenseNet with a large margin. Moreover, CWS also reduces computational cost and training time to a large extent.

show abstract

Section: Introductionmentioning

confidence: 99%