Orthogonality-Regularized Masked NMF for Learning on Weakly Labeled Audio Data

Sobieraj, Iwona; Rencker, Lucas; Plumbley, Mark D.

doi:10.1109/icassp.2018.8461293

Cited by 3 publications

(3 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some approaches propagate the bag-level label to all instances and train against these directly [51,56], which can introduce instance-level label noise. Other approaches are based on source separation, and obtain dynamic labels by post-processing the separated sources (e.g., by computing the frame-wise energy of each separated source) [57,58].…”

Section: Sound Event Detection Using Weakly Labeled Datamentioning

confidence: 99%

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

McFee

Salamon

Bello

2018

IEEE/ACM Trans. Audio Speech Lang. Process.

146

130

View full text Add to dashboard Cite

Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor-and cost-intensive for human annotators to produce, which limits the practical scalability of SED methods.In this work, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality. The models, however, must still produce temporally dynamic predictions, which must be aggregated (pooled) when comparing against static labels during training. To facilitate this aggregation, we develop a family of adaptive pooling operators-referred to as auto-pool-which smoothly interpolate between common pooling operators, such as min-, max-, or average-pooling, and automatically adapt to the characteristics of the sound sources in question. We evaluate the proposed pooling operators on three datasets, and demonstrate that in each case, the proposed methods outperform non-adaptive pooling operators for static prediction, and nearly match the performance of models trained with strong, dynamic annotations. The proposed method is evaluated in conjunction with convolutional neural networks, but can be readily applied to any differentiable model for time-series label prediction. While this article focuses on SED applications, the proposed methods are general, and could be applied widely to MIL problems in any domain.

show abstract

Section: Sound Event Detection Using Weakly Labeled Datamentioning

confidence: 99%

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

McFee

Salamon

Bello

2018

IEEE/ACM Trans. Audio Speech Lang. Process.

146

130

View full text Add to dashboard Cite

show abstract

“…For the AED model we use a model from our previous work, where we adapted a standard NMF approach to learning on weakly labeled data [21].…”

Section: Orthogonality-regularized Nmfmentioning

confidence: 99%

Acoustic Event Detection from Weakly Labeled Data Using Auditory Salience

Podwinska

Sobieraj

Fazenda

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Acoustic Event Detection (AED) is an important task of machine listening which, in recent years, has been addressed using common machine learning methods like Non-negative Matrix Factorization (NMF) or deep learning. However, most of these approaches do not take into consideration the way that human auditory system detects salient sounds. In this work, we propose a method for AED using weakly labeled data that combines a Non-negative Matrix Factorization model with a salience model based on predictive coding in the form of Kalman filters. We show that models of auditory perception, particularly auditory salience, can be successfully incorporated into existing AED methods and improve their performance on rare event detection. We evaluate the method on the Task2 of DCASE2017 Challenge.

show abstract

“…In [9], class activity penalties and structured dropout are used for score-informed source separation by applying constraints to the latent units of an autoencoder (AE). In [10], an NMF method is proposed that is trained on weakly labeled data. Another work that utilizes class information is [11] where a conditional variational autoencoder (VAE) is trained as a universal generative model to represent known source classes.…”

Section: Introductionmentioning

confidence: 99%

Audio Source Separation Using Variational Autoencoders and Weak Class Supervision

Karamatli

Cemgil

Kirbiz

2019

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

In this paper, we propose a source separation method that is trained by observing the mixtures and the class labels of the sources present in the mixture without any access to isolated sources. Since our method does not require source class labels for every time-frequency bin but only a single label for each source constituting the mixture signal, we call this scenario as weak class supervision. We associate a variational autoencoder (VAE) with each source class within a non-negative (compositional) model. Each VAE provides a prior model to identify the signal from its associated class in a sound mixture. After training the model on mixtures, we obtain a generative model for each source class and demonstrate our method on one-second mixtures of utterances of digits from 0 to 9. We show that the separation performance obtained by source class supervision is as good as the performance obtained by source signal supervision.

show abstract

Orthogonality-Regularized Masked NMF for Learning on Weakly Labeled Audio Data

Cited by 3 publications

References 10 publications

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

Acoustic Event Detection from Weakly Labeled Data Using Auditory Salience

Audio Source Separation Using Variational Autoencoders and Weak Class Supervision

Contact Info

Product

Resources

About