ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682772
|View full text |Cite
|
Sign up to set email alerts
|

Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection

Abstract: Sound event detection is a challenging task, especially for scenes with multiple simultaneous events. While event classification methods tend to be fairly accurate, event localization presents additional challenges, especially when large amounts of labeled data are not available. Task4 of the 2018 DCASE challenge presents an event detection task that requires accuracy in both segmentation and recognition of events while providing only weakly labeled training data. Supervised methods can produce accurate event … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 17 publications
(26 citation statements)
references
References 23 publications
0
26
0
Order By: Relevance
“…Recently, much research attention has been brought up in order to improve CRNN performance [5,6]. Kothinti [7] presented an interesting approach by separating SED into an unsupervised onset and offset estimation problem using conditional restricted Boltzmann machines (c-RBM) along with a supervised label prediction using CRNN. The results of 30% F1 development and 25 % F1 evaluation performance on the DCASE2018 task4 dataset indicate the robustness of this approach.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, much research attention has been brought up in order to improve CRNN performance [5,6]. Kothinti [7] presented an interesting approach by separating SED into an unsupervised onset and offset estimation problem using conditional restricted Boltzmann machines (c-RBM) along with a supervised label prediction using CRNN. The results of 30% F1 development and 25 % F1 evaluation performance on the DCASE2018 task4 dataset indicate the robustness of this approach.…”
Section: Introductionmentioning
confidence: 99%
“…This system was also found to perform rather poorly on event-based evaluations for a similar task in DCASE 2018 [106]. In addition, experiments also found that such architecture cannot be simplified and will result in poor detection accuracy [107] and thus requiring a large computational cost due to the sophisticated network.…”
Section: E Models Utilizing Weakly Labeled Datamentioning
confidence: 98%
“…Kothinti et al [106] approach was to develop two separate models for different purposes, as illustrated in Figure 15. The first model, which was a combination of Restricted Boltzmann Machine (RBM), conditional RBM (cRBM) and Principle Component Analysis (PCA), was in charge of the event boundary detection.…”
Section: Figure 14 Flowchart Of Lin Et Al Framework [114]mentioning
confidence: 99%
“…The CNN architecture allows feature extraction robust against time and frequency shifts, which often occur in environmental sound analysis. An RNN has also been applied to SED in some works [14]- [16] to explicitly model time correlations of sound events. In particular, it has been reported that neural networks combining the CNN and a bidirectional gated recurrent unit (BiGRU) [20], [21], which can capture forward and backward temporal correlations of sound events, successfully detected sound events.…”
Section: Conventional Sound Event Detection Based On Convolutional Rementioning
confidence: 99%
“…For example, a convolutional neural network (CNN)-based approach, which can detect sound events robustly against time and frequency shifts in the input acoustic feature, has been used in many works [12], [13]. Recurrent neural network (RNN)-or convolutional recurrent neural network (CRNN)-based approaches, which can capture temporal information of sound events, have also been utilized in some works [14]- [16]. These methods successfully analyze overlapping sound events with reasonable performance.…”
Section: Introductionmentioning
confidence: 99%