Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-3115
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Audio Source Separation Using Generative Priors

Abstract: State-of-the-art under-determined audio source separation systems rely on supervised end-end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are severely challenged in terms of requiring access to expensive source level labeled data and being specific to a given set of sources and the mixing process, which demands complete re-training when those assumptions change. This strongly emphasizes the need for unsupervised methods … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 26 publications
0
11
0
Order By: Relevance
“…Mixture invariant training (MixIT) 53 represents an entirely unsupervised approach and requires only single-channel acoustic mixtures, though this technique may be limited in the bioacoustic domain as it necessitates large quantities of well-defined mixtures. Another unsupervised technique includes a Bayesian approach employing deep generative priors [54][55][56] , but this method may be limited to data within close bounds of the training sets since the distributions learned by acoustic deep generative models may not exhibit the right properties for probabilistic source separation 57 . We also suggest self-supervised pre-training 30,58,59 on relevant proxy tasks to enhance performance, especially in the low-data bioacoustic domain.…”
Section: Discussionmentioning
confidence: 99%
“…Mixture invariant training (MixIT) 53 represents an entirely unsupervised approach and requires only single-channel acoustic mixtures, though this technique may be limited in the bioacoustic domain as it necessitates large quantities of well-defined mixtures. Another unsupervised technique includes a Bayesian approach employing deep generative priors [54][55][56] , but this method may be limited to data within close bounds of the training sets since the distributions learned by acoustic deep generative models may not exhibit the right properties for probabilistic source separation 57 . We also suggest self-supervised pre-training 30,58,59 on relevant proxy tasks to enhance performance, especially in the low-data bioacoustic domain.…”
Section: Discussionmentioning
confidence: 99%
“…Although the DNN-based methods show superior performance in the usual supervised source separation setting [9], in the score-informed setting, this NMF-based method works better than DNN-based methods [15]. The audio source separation method presented in [21] uses pretrained instrument sound synthesizers based on generative adversar-ial networks (GANs). The GANs convert random vectors into audio signals.…”
Section: Audio Source Separationmentioning
confidence: 99%
“…Given this success, it was natural to also apply this method to audio data, leading to the deep audio prior approach by Tian et al [28]. Based on this, Narayanaswamy et al [29] used generative adversarial networks (GANs) trained on unlabeled training data as priors, further improving the quality of the output signals.…”
Section: Related Workmentioning
confidence: 99%