Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection

Kothinti, Sandeep; Imoto, Keisuke; Chakrabarty, Debmalya; Sell, Gregory; Watanabe, Shinji; Elhilali, Mounya

doi:10.1109/icassp.2019.8682772

Cited by 17 publications

(26 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, much research attention has been brought up in order to improve CRNN performance [5,6]. Kothinti [7] presented an interesting approach by separating SED into an unsupervised onset and offset estimation problem using conditional restricted Boltzmann machines (c-RBM) along with a supervised label prediction using CRNN. The results of 30% F1 development and 25 % F1 evaluation performance on the DCASE2018 task4 dataset indicate the robustness of this approach.…”

Section: Introductionmentioning

confidence: 99%

Duration Robust Weakly Supervised Sound Event Detection

Dinkel

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Task 4 of the DCASE2018 challenge demonstrated that substantially more research is needed for a real-world application of sound event detection. Analyzing the challenge results it can be seen that most successful models are biased towards predicting long (e.g., over 5s) clips. This work aims to investigate the performance impact of fixed-sized window median filter post-processing and advocate the use of double thresholding as a more robust and predictable post-processing method. Further, four different temporal subsampling methods within the CRNN framework are proposed: mean-max, α-mean-max, L p -norm and convolutional. We show that for this task subsampling the temporal resolution by a neural network enhances the F1 score as well as its robustness towards short, sporadic sound events. Our best single model achieves 30.1% F1 on the evaluation set and the best fusion model 32.5%, while being robust to event length variations.

show abstract

Section: Introductionmentioning

confidence: 99%

Duration Robust Weakly Supervised Sound Event Detection

Dinkel

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…This system was also found to perform rather poorly on event-based evaluations for a similar task in DCASE 2018 [106]. In addition, experiments also found that such architecture cannot be simplified and will result in poor detection accuracy [107] and thus requiring a large computational cost due to the sophisticated network.…”

Section: E Models Utilizing Weakly Labeled Datamentioning

confidence: 98%

“…Kothinti et al [106] approach was to develop two separate models for different purposes, as illustrated in Figure 15. The first model, which was a combination of Restricted Boltzmann Machine (RBM), conditional RBM (cRBM) and Principle Component Analysis (PCA), was in charge of the event boundary detection.…”

Section: Figure 14 Flowchart Of Lin Et Al Framework [114]mentioning

confidence: 99%

A Comprehensive Review of Polyphonic Sound Event Detection

Chan

Chin

2020

IEEE Access

View full text Add to dashboard Cite

One of the most amazing functions of the human auditory system is the ability to detect all kinds of sound events in the environment. With the technologies and hardware advances, polyphonic Sound Event Detection (SED) can be developed to mimic the ability of the human auditory system. However, the development of a SED system is no trivial task, and several different factors often hinder accuracy. Although there are several overview papers available, most of them only provide a theoretical overview of algorithms used with little discussion. Thus, to the best of the authors' knowledge, there is no comprehensive review that covers this particular domain. Therefore, this paper aims to provide an in-depth discussion of different methodologies proposed by various authors that include the features used, detection algorithms, and their corresponding accuracy and limitations. Additional information on possible trends is also discussed that can be useful for future development works.

show abstract

“…The CNN architecture allows feature extraction robust against time and frequency shifts, which often occur in environmental sound analysis. An RNN has also been applied to SED in some works [14]- [16] to explicitly model time correlations of sound events. In particular, it has been reported that neural networks combining the CNN and a bidirectional gated recurrent unit (BiGRU) [20], [21], which can capture forward and backward temporal correlations of sound events, successfully detected sound events.…”

Section: Conventional Sound Event Detection Based On Convolutional Rementioning

confidence: 99%

“…For example, a convolutional neural network (CNN)-based approach, which can detect sound events robustly against time and frequency shifts in the input acoustic feature, has been used in many works [12], [13]. Recurrent neural network (RNN)-or convolutional recurrent neural network (CRNN)-based approaches, which can capture temporal information of sound events, have also been utilized in some works [14]- [16]. These methods successfully analyze overlapping sound events with reasonable performance.…”

Section: Introductionmentioning

confidence: 99%

Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-Occurrence

Imoto

Kyochi

2020

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

A limited number of types of sound event occur in an acoustic scene and some sound events tend to co-occur in the scene; for example, the sound events "dishes" and "glass jingling" are likely to cooccur in the acoustic scene "cooking." In this paper, we propose a method of sound event detection using graph Laplacian regularization with sound event co-occurrence taken into account. In the proposed method, the occurrences of sound events are expressed as a graph whose nodes indicate the frequencies of event occurrence and whose edges indicate the sound event co-occurrences. This graph representation is then utilized for the model training of sound event detection, which is optimized under an objective function with a regularization term considering the graph structure of sound event occurrence and co-occurrence. Evaluation experiments using the TUT Sound Events 2016 and 2017 detasets, and the TUT Acoustic Scenes 2016 dataset show that the proposed method improves the performance of sound event detection by 7.9 percentage points compared with the conventional CNN-BiGRU-based detection method in terms of the segment-based F1 score. In particular, the experimental results indicate that the proposed method enables the detection of co-occurring sound events more accurately than the conventional method.

show abstract

Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection

Cited by 17 publications

References 23 publications

Duration Robust Weakly Supervised Sound Event Detection

Duration Robust Weakly Supervised Sound Event Detection

A Comprehensive Review of Polyphonic Sound Event Detection

Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-Occurrence

Contact Info

Product

Resources

About