A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling

Wang, Yun; Li, Juncheng; Metze, Florian

doi:10.1109/icassp.2019.8682847

Cited by 147 publications

(145 citation statements)

References 19 publications

Supporting

Mentioning

139

Contrasting

Order By: Relevance

“…Recent research investigates memory dynamics and control in recurrent neural networks (RNNs) including LSTM [22]. There are comparisons on different pooling functions for AEC/AED [9], and spatio-temporal attention pooling proposed for audio scene classification [23].…”

Section: Introductionmentioning

confidence: 99%

“…Wang et al [9] did a thorough analysis theoretically and experimentally of five pooling functions on prediction. The analysis was done for multiple instance learning framework on AED with weak labeling, whose goal is to detect and localize events at the same time.…”

Section: Introductionmentioning

confidence: 99%

“…This dataset employs a subset of AudioSet [25], which includes 10-second clips containing 17 sound events from two categories: "Warning" and "Vehicle". Although audio tagging, which is similar to our utterance-level classification task, is covered in the experiments in [9], there are two fundamental differences between this work and theirs. 1) Different characteristics of events: Our work focuses on rare events, and we conducted experiments on DCASE 2017 task 2 [24]: detection of rare sound events.…”

Section: Introductionmentioning

confidence: 99%

“…2) Different focuses of analysis: In [9], the analysis focuses on the effect of pooling functions on both event localization and classification for weak labeling. Due to the requirement of event localization, only pooling functions on prediction were discussed in [9]. Our work analyzes the effect of pooling functions on LSTM based AEC mod-…”

Section: Introductionmentioning

confidence: 99%

“…Experiments are designed for understanding the memory dynamics for LSTM models, and looking for solutions to mitigate the sensitivity to event positions. Moreover, we further discuss four pooling functions on the feature side, which are not covered in [9]. In this paper, we investigate the dynamics of LSTM memory on AEC tasks, including an analysis on LSTM memory retaining, and a benchmarking for impacts of different pooling approaches on LSTM memory dynamics and AEC accuracies, using 1.7M synthesized clips (across 3 event types and 3 SNRs).…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification

Kao

Sun

Wang

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Acoustic event classification (AEC) and acoustic event detection (AED) refer to the task of detecting whether specific target events occur in audios. As long short-term memory (LSTM) leads to stateof-the-art results in various speech related tasks, it is employed as a popular solution for AEC as well. This paper focuses on investigating the dynamics of LSTM model on AEC tasks. It includes a detailed analysis on LSTM memory retaining, and a benchmarking of nine different pooling methods on LSTM models using 1.7M generated mixture clips of multiple events with different signal-tonoise ratios. This paper focuses on understanding: 1) utterance-level classification accuracy; 2) sensitivity to event position within an utterance. The analysis is done on the dataset for the detection of rare sound events from DCASE 2017 Challenge. We find max pooling on the prediction level to perform the best among the nine pooling approaches in terms of classification accuracy and insensitivity to event position within an utterance. To authors' best knowledge, this is the first kind of such work focused on LSTM dynamics for AEC tasks.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations