2019
DOI: 10.48550/arxiv.1911.02182
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision

Fatemeh Pishdadian,
Gordon Wichern,
Jonathan Le Roux

Abstract: While there has been much recent progress using deep learning techniques to separate speech and music audio signals, these systems typically require large collections of isolated sources during the training process. When extending audio source separation algorithms to more general domains such as environmental monitoring, it may not be possible to obtain isolated signals for training. Here, we propose objective functions and network architectures that enable training a source separation system with weak labels… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 42 publications
0
3
0
Order By: Relevance
“…However, these experiments relied on having sufficient supervised training data and were evaluated only on test sets with similar environmental conditions and sound distributions. In order to extend the reach of this approach, methods have been proposed to train separation models with no access to ground truth clean sources by utilizing weak class labels [28], the spatial separability of the sources [34,31,8] and self-supervision in the form of MixIT [41]. This makes it possible to learn separation of signals well outside the domains for which isolated source databases exist.…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…However, these experiments relied on having sufficient supervised training data and were evaluated only on test sets with similar environmental conditions and sound distributions. In order to extend the reach of this approach, methods have been proposed to train separation models with no access to ground truth clean sources by utilizing weak class labels [28], the spatial separability of the sources [34,31,8] and self-supervision in the form of MixIT [41]. This makes it possible to learn separation of signals well outside the domains for which isolated source databases exist.…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…To avoid these issues, a number of recent works have proposed replacing the strong supervision of reference source signals with weak supervision labels from related modalities such as sound class [25,18], visual input [8], or spatial location from multi-microphone recordings [31,27,4]. Most recently, [35] proposed mixture invariant training (MixIT), which provides a purely unsupervised source separation framework for a variable number of latent sources.…”
Section: Relation To Previous Workmentioning
confidence: 99%
“…To that end, weakly supervised training has been proposed to substitute the strong labels of source references with another modality such as class labels, visual features, or spatial information. In [32] class labels were used as a substitute for signal-level losses. The spatial locations of individual sources, which can be inferred from multichannel audio, has also been used to guide learning of single-channel separation [38,35,8].…”
Section: Introductionmentioning
confidence: 99%