2017 International Conference on Orange Technologies (ICOT) 2017
DOI: 10.1109/icot.2017.8336092
|View full text |Cite
|
Sign up to set email alerts
|

A survey of deep learning for polyphonic sound event detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
16
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 22 publications
(16 citation statements)
references
References 5 publications
0
16
0
Order By: Relevance
“…The longshort term memory (LSTM) [24] and gated recurrent units (GRU) [25] are two improved models of RNNs. Although RNNs are powerful, it is difficult to train a long-range sequence of data due to vanishing or exploding gradient problem [26]. To solve this issue, LSTM and GRU use gate units to decide what information to keep or remove from the previous state.…”
Section: Category 3: Recurrent Neural Networkmentioning
confidence: 99%
“…The longshort term memory (LSTM) [24] and gated recurrent units (GRU) [25] are two improved models of RNNs. Although RNNs are powerful, it is difficult to train a long-range sequence of data due to vanishing or exploding gradient problem [26]. To solve this issue, LSTM and GRU use gate units to decide what information to keep or remove from the previous state.…”
Section: Category 3: Recurrent Neural Networkmentioning
confidence: 99%
“…As a complementary read to this article, Barchiesi et al published an in-depth overview of ASC methods using "traditional" feature extraction and classification techniques prior to the general transition to deep learning based methods in [3]. Other related survey articles focus on deep learning methods for AED [4,5] or summarize algorithms submitted for various machine listening tasks including ASC for a particular year of the DCASE challenge such as [6]. Methodologies and common datasets for evaluating ASC algorithms are not further addressed in this article.…”
Section: Introductionmentioning
confidence: 99%
“…Naturally, a polyphonic SED system is more appropriate in a real-life application because a real-life environment is more likely to contain multiple sound sources [16][17][18][19]. But this would also indicate that a polyphonic SED system is much more challenging because the different sound event can coincide [15][16][17], [20], [21] and features extracted from the mixture may not match any of the features extracted from sounds in isolation [18], [19]. Besides, it is not known a priori how many events can be present in a recording.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, reviews by Dang et al. [20] and Xia et al [28] only covered a brief theoretical aspect of several deep learning models while Bui et al [29] cover Non-negative Matrix Factorization (NMF). Rex [30] provided software recommendations for SED.…”
Section: Introductionmentioning
confidence: 99%