A survey of deep learning for polyphonic sound event detection

Dang, An; Vu, Toan H.; Wang, Jia-Ching

doi:10.1109/icot.2017.8336092

Cited by 22 publications

(16 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The longshort term memory (LSTM) [24] and gated recurrent units (GRU) [25] are two improved models of RNNs. Although RNNs are powerful, it is difficult to train a long-range sequence of data due to vanishing or exploding gradient problem [26]. To solve this issue, LSTM and GRU use gate units to decide what information to keep or remove from the previous state.…”

Section: Category 3: Recurrent Neural Networkmentioning

confidence: 99%

Recent advances in deep learning

Wang

Zhao

Pourpanah

2020

Int. J. Mach. Learn. & Cyber.

221

View full text Add to dashboard Cite

Section: Category 3: Recurrent Neural Networkmentioning

confidence: 99%

Recent advances in deep learning

Wang

Zhao

Pourpanah

2020

Int. J. Mach. Learn. & Cyber.

221

View full text Add to dashboard Cite

“…As a complementary read to this article, Barchiesi et al published an in-depth overview of ASC methods using "traditional" feature extraction and classification techniques prior to the general transition to deep learning based methods in [3]. Other related survey articles focus on deep learning methods for AED [4,5] or summarize algorithms submitted for various machine listening tasks including ASC for a particular year of the DCASE challenge such as [6]. Methodologies and common datasets for evaluating ASC algorithms are not further addressed in this article.…”

Section: Introductionmentioning

confidence: 99%

A Review of Deep Learning Based Methods for Acoustic Scene Classification

Abeßer

2020

Applied Sciences

129

View full text Add to dashboard Cite

The number of publications on acoustic scene classification (ASC) in environmental audio recordings has constantly increased over the last few years. This was mainly stimulated by the annual Detection and Classification of Acoustic Scenes and Events (DCASE) competition with its first edition in 2013. All competitions so far involved one or multiple ASC tasks. With a focus on deep learning based ASC algorithms, this article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation, and for data modeling, i.e., neural network architectures and learning paradigms. Finally, the paper discusses current algorithmic limitations and open challenges in order to preview possible future developments towards the real-life application of ASC systems.

show abstract

“…Naturally, a polyphonic SED system is more appropriate in a real-life application because a real-life environment is more likely to contain multiple sound sources [16][17][18][19]. But this would also indicate that a polyphonic SED system is much more challenging because the different sound event can coincide [15][16][17], [20], [21] and features extracted from the mixture may not match any of the features extracted from sounds in isolation [18], [19]. Besides, it is not known a priori how many events can be present in a recording.…”

Section: Introductionmentioning

confidence: 99%

“…On the other hand, reviews by Dang et al. [20] and Xia et al [28] only covered a brief theoretical aspect of several deep learning models while Bui et al [29] cover Non-negative Matrix Factorization (NMF). Rex [30] provided software recommendations for SED.…”

Section: Introductionmentioning

confidence: 99%

A Comprehensive Review of Polyphonic Sound Event Detection

Chan

Chin

2020

IEEE Access

View full text Add to dashboard Cite

One of the most amazing functions of the human auditory system is the ability to detect all kinds of sound events in the environment. With the technologies and hardware advances, polyphonic Sound Event Detection (SED) can be developed to mimic the ability of the human auditory system. However, the development of a SED system is no trivial task, and several different factors often hinder accuracy. Although there are several overview papers available, most of them only provide a theoretical overview of algorithms used with little discussion. Thus, to the best of the authors' knowledge, there is no comprehensive review that covers this particular domain. Therefore, this paper aims to provide an in-depth discussion of different methodologies proposed by various authors that include the features used, detection algorithms, and their corresponding accuracy and limitations. Additional information on possible trends is also discussed that can be useful for future development works.

show abstract

A survey of deep learning for polyphonic sound event detection

Cited by 22 publications

References 5 publications

Recent advances in deep learning

Recent advances in deep learning

A Review of Deep Learning Based Methods for Acoustic Scene Classification

A Comprehensive Review of Polyphonic Sound Event Detection

Contact Info

Product

Resources

About