2020
DOI: 10.1109/taslp.2019.2953350
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Multi-Scale Detection of Acoustic Events

Abstract: The goal of acoustic (or sound) events detection (AED or SED) is to predict the temporal position of target events in given audio segments. This task plays a significant role in safety monitoring, acoustic early warning and other scenarios. However, the deficiency of data and diversity of acoustic event sources make the AED task a tough issue, especially for prevalent data-driven methods. In this paper, we start from analyzing acoustic events according to their time-frequency domain properties, showing that di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 57 publications
0
9
0
Order By: Relevance
“…This section details the architectural details and evaluation metrics for rare SED using CNN and visual object detectors. A simple CNN architecture was selected in this experiment that did not involve the use of any CNN variant such as guided learning [2], a recurrent neural network [5,6], or weakly-supervised learning [21,22]. These architectures have provided best results in previous challenges on preexisting data and included annotations.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This section details the architectural details and evaluation metrics for rare SED using CNN and visual object detectors. A simple CNN architecture was selected in this experiment that did not involve the use of any CNN variant such as guided learning [2], a recurrent neural network [5,6], or weakly-supervised learning [21,22]. These architectures have provided best results in previous challenges on preexisting data and included annotations.…”
Section: Methodsmentioning
confidence: 99%
“…Among these, the recurrent neural network (RNN)-based SED method [3,4] adopts a large temporal context and performs relatively better than basic CNN structure. Hybrid structures containing both CNN and RNN layers, known as convolutional recurrent neural networks (CRNNs) [5,6] are being developed for SED that integrate both spatial and C temporal properties of the audio signal. A CRNN for joint sound event localization and detection (SELD) of multiple overlapping sound events in three dimensional (3D) space has been proposed by Adavanne et al [7].…”
Section: A Literature Reviewmentioning
confidence: 99%
“…Ding and He [95] proposed an adaptive multi-scale detection method that combined the idea of an hourglass network with Bidirectional GRU (BGR). It is a CRNN but with much higher sophistication.…”
Section: Figure 11 Flowchart Of a Crnnmentioning
confidence: 99%
“…The resulting values were then added up and sent to the output layer. In their study, Ding and He [95] proposed using a 4 layers hourglass network with 3 layers bidirectional GRU at each scale. Based on such architecture, Ding and He [95] achieved a single second F1-score of 48.7% with an ER of 0.7821 on TUT-SED 2016 evaluation dataset and a single second F1-score of 43.6% with an ER of 0.7723 on TUT-SED 2017 evaluation dataset.…”
Section: Figure 11 Flowchart Of a Crnnmentioning
confidence: 99%
“…The RNNs have been successfully applied to various tasks, such as data-driven modelling [39], image caption [40], sentiment analysis [41] and speech recognition [42]. For example, Ergen and Kozat [43] studied online training of the LSTM architecture in a distributed network of nodes for regression and introduced online distributed training algorithms for variable-length data sequences. Zhao et al [44] proposed a new approach, the CAM-RNN to extract the most correlated visual feature and a text feature for the task of video captioning, which was composed of three parts, i.e., visual attention module, text attention module and balancing gate.…”
Section: Model Buildingmentioning
confidence: 99%