MUSDB18 - a corpus for music separation

Rafii, Zafar; Stöter, Fabian-Robert; Mimilakis, Stylianos Ioannis; Bittner, Rachel M.

doi:10.5281/zenodo.1117372

Cited by 59 publications

(51 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These categories are: acappella, background music, beatboxing, choir, drum, lullaby, rapping, theremin, whistling and yodelling. In addition, we use MUSDB18 dataset [39] to have pop and rock examples as accompaniment. In order to generate an artificial mixture we ensure that all the samples from Acappella are used in each epoch.…”

Section: Methodsmentioning

confidence: 99%

A cappella: Audio-visual Singing Voice Separation

Montesinos¹,

Kadandale²,

Haro³

2021

Preprint

View full text Add to dashboard Cite

Music source separation can be interpreted as the estimation of the constituent music sources that a music clip is composed of. In this work, we explore the single-channel singing voice separation problem from a multimodal perspective, by jointly learning from audio and visual modalities. To do so, we present Acappella, a dataset spanning around 46 hours of a cappella solo singing videos sourced from YouTube. We propose Y-Net, an audio-visual convolutional neural network which achieves state-of-the-art singing voice separation results on the Acappella dataset and compare it against its audio-only counterpart, U-Net, and a state-of-the-art audio-visual speech separation model. Singing voice separation can be particularly challenging when the audio mixture also comprises of other accompaniment voices and background sounds along with the target voice of interest. We demonstrate that our model can outperform the baseline models in the singing voice separation task in such challenging scenarios. The code, the pretrained models and the dataset are publicly available at https: //ipcv.github.io/Acappella/

show abstract

Section: Methodsmentioning

confidence: 99%

A cappella: Audio-visual Singing Voice Separation

Montesinos¹,

Kadandale²,

Haro³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Qualitative comparisons are provided in the supplement. We do not compare results on the popular MusDB dataset (Rafii et al, 2017) because this dataset has insufficient single-channel audio to train WaveNet generative models.…”

Section: Source Separationmentioning

confidence: 99%

Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics

Jayaram¹,

Thickstun²

2021

Preprint

View full text Add to dashboard Cite

This paper introduces an alternative approach to sampling from autoregressive models. Autoregressive models are typically sampled sequentially, according to the transition dynamics defined by the model. Instead, we propose a sampling procedure that initializes a sequence with white noise and follows a Markov chain defined by Langevin dynamics on the global loglikelihood of the sequence. This approach parallelizes the sampling process and generalizes to conditional sampling. Using an autoregressive model as a Bayesian prior, we can steer the output of a generative model using a conditional likelihood or constraints. We apply these techniques to autoregressive models in the visual and audio domains, with competitive results for audio source separation, super-resolution, and inpainting.

show abstract

“…The use of deep learning in MSS was accelerated ever since and led to improved SDR results year after year in the successive SiSEC editions, held in 2016 (Liutkus et al, 2017) and 2018 (Stöter et al, 2018). An important component of this success story was the release of publicly available datasets such as Rafii et al (2017) which, compared to previous datasets such as Bittner et al (2014), was created specifically for MSS tasks. MUSDB18 consists of 150 music tracks in four stems and is up until now widely used due to a lack of alternatives 1 .…”

Section: Introductionmentioning

confidence: 99%

Music Demixing Challenge 2021

Mitsufuji¹,

Fabbro²,

Uhlich³

et al. 2022

Front. Signal Process.

Self Cite

View full text Add to dashboard Cite

Music source separation has been intensively studied in the last decade and tremendous progress with the advent of deep learning could be observed. Evaluation campaigns such as MIREX or SiSEC connected state-of-the-art models and corresponding papers, which can help researchers integrate the best practices into their models. In recent years, the widely used MUSDB18 dataset played an important role in measuring the performance of music source separation. While the dataset made a considerable contribution to the advancement of the field, it is also subject to several biases resulting from a focus on Western pop music and a limited number of mixing engineers being involved. To address these issues, we designed the Music Demixing Challenge on a crowd-based machine learning competition platform where the task is to separate stereo songs into four instrument stems (Vocals, Drums, Bass, Other). The main differences compared with the past challenges are 1) the competition is designed to more easily allow machine learning practitioners from other disciplines to participate, 2) evaluation is done on a hidden test set created by music professionals dedicated exclusively to the challenge to assure the transparency of the challenge, i.e., the test set is not accessible from anyone except the challenge organizers, and 3) the dataset provides a wider range of music genres and involved a greater number of mixing engineers. In this paper, we provide the details of the datasets, baselines, evaluation metrics, evaluation results, and technical challenges for future competitions.

show abstract

MUSDB18 - a corpus for music separation

Cited by 59 publications

References 0 publications

A cappella: Audio-visual Singing Voice Separation

A cappella: Audio-visual Singing Voice Separation

Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics

Music Demixing Challenge 2021

Contact Info

Product

Resources

About