2020
DOI: 10.48550/arxiv.2005.11262
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LibriMix: An Open-Source Dataset for Generalizable Speech Separation

Abstract: In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two-or three-speaker m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
77
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(78 citation statements)
references
References 25 publications
1
77
0
Order By: Relevance
“…S2VC includes several self-supervised representations, and here we adopted CPC [13] version since it was reported to perform the best. For SE, we chose off-the-shelf models pre-trained on different datasets: DEMUCS, on Valentini [14] and DNS [15]; MetricGAN+ [16], on VoiceBank-DEMAND [17]; and Conv-TasNet [18], on LibriMix [19].…”
Section: Modelsmentioning
confidence: 99%
“…S2VC includes several self-supervised representations, and here we adopted CPC [13] version since it was reported to perform the best. For SE, we chose off-the-shelf models pre-trained on different datasets: DEMUCS, on Valentini [14] and DNS [15]; MetricGAN+ [16], on VoiceBank-DEMAND [17]; and Conv-TasNet [18], on LibriMix [19].…”
Section: Modelsmentioning
confidence: 99%
“…Libri2Mix [2]. This dataset was constructed using train-100, train-360, dev, and test set in the LibriSpeech dataset [25].…”
Section: Datasetmentioning
confidence: 99%
“…Cross-domain SS and TSE tasks: the English Libri2Mix [24] and Mandarin Aishell2Mix are used as the supervised source domain and unsupervised target domain dataset, respectively. Each mixture in Aishell2Mix is generated by mixing two speakers' utterances from Aishell-1 [25].…”
Section: Task Constructionmentioning
confidence: 99%
“…On the noisy and reverberant in-domain LibriSpeech dataset [23], the proposed DPCCN achieves more than 1.4 dB absolute SISNR improvement over all listed state-of-the-art time-domain speech separation methods. For the cross-domain speech separation and extraction tasks, we evaluate the proposed approaches on the clean Libri2Mix [24] and Aishell2Mix that created by ourselves from Aishell-1 [25] corpus. Extensive results show that the DPCCN-based systems are much more robust and achieve much better performance than baselines.…”
Section: Introductionmentioning
confidence: 99%