ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683555
|View full text |Cite
|
Sign up to set email alerts
|

Singing Voice Separation: A Study on Training Data

Abstract: In the recent years, singing voice separation systems showed increased performance due to the use of supervised training. The design of training datasets is known as a crucial factor in the performance of such systems. We investigate on how the characteristics of the training dataset impacts the separation performances of state-ofthe-art singing voice separation algorithms. We show that the separation quality and diversity are two important and complementary assets of a good training dataset. We also provide i… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 32 publications
(26 citation statements)
references
References 17 publications
0
26
0
Order By: Relevance
“…To compare the separation of singing voice with state-of-the-art, we also include models that separate the mixture into four sources. It has been shown in [6] that, these four-source models have similar vocal separation performance compared to two-source models, even though the four-source separation task is more challenging than the two-source counterpart; possibly because of the additional supervision provided by different instrumental sources in the multi-task learning setup. Hence, we include the vocal SDR values of state-ofthe-arts for four-source models [10,11] in our comparison.…”
Section: Comparison With Other Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To compare the separation of singing voice with state-of-the-art, we also include models that separate the mixture into four sources. It has been shown in [6] that, these four-source models have similar vocal separation performance compared to two-source models, even though the four-source separation task is more challenging than the two-source counterpart; possibly because of the additional supervision provided by different instrumental sources in the multi-task learning setup. Hence, we include the vocal SDR values of state-ofthe-arts for four-source models [10,11] in our comparison.…”
Section: Comparison With Other Methodsmentioning
confidence: 99%
“…Using the best combination of input length (10 seconds) and model size (8.3M), we experiment with different probability of applying random mixing. [6] shows that random mixing does not have a positive effect on test SDR, and one possible explanation is that it creates mixtures with somewhat independent sources. Our experiments, however, indicate that random mixing alone significantly improves the results.…”
Section: Teacher Trainingmentioning
confidence: 96%
See 1 more Smart Citation
“…The pre-trained models are U-nets (Jansson et al, 2017) and follow similar specifications as in (Prétet, Hennequin, Royo-Letelier, & Vaglio, 2019). The U-net is an encoder/decoder Convolutional Neural Network (CNN) architecture with skip connections.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Training loss is a L 1 -norm between masked input mix spectrograms and source-target spectrograms. The models were trained on Deezer's internal datasets (noteworthily the Bean dataset that was used in (Prétet et al, 2019)) using Adam (Kingma & Ba, 2014). Training time took approximately a full week on a single GPU.…”
Section: Implementation Detailsmentioning
confidence: 99%