ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682909
|View full text |Cite
|
Sign up to set email alerts
|

Polyphonic Sound Event Detection Using Convolutional Bidirectional Lstm and Synthetic Data-based Transfer Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(29 citation statements)
references
References 17 publications
0
29
0
Order By: Relevance
“…Based on this architecture, Jung et al [90] achieved a single frame segment F1-score of 49.9, which is much higher as compared to Cakir et al [14] architecture producing a single frame segment F1-score of 27.5. With the application of transfer learning, Jung et al [90] achieved an even higher single frame segment F1-score of 55.9. Jung et al [90] system also have a lower single frame ER of 0.56 as compared to Cakir et al [14] architecture, which had a single frame ER of 0.98.…”
Section: Figure 11 Flowchart Of a Crnnmentioning
confidence: 98%
See 3 more Smart Citations
“…Based on this architecture, Jung et al [90] achieved a single frame segment F1-score of 49.9, which is much higher as compared to Cakir et al [14] architecture producing a single frame segment F1-score of 27.5. With the application of transfer learning, Jung et al [90] achieved an even higher single frame segment F1-score of 55.9. Jung et al [90] system also have a lower single frame ER of 0.56 as compared to Cakir et al [14] architecture, which had a single frame ER of 0.98.…”
Section: Figure 11 Flowchart Of a Crnnmentioning
confidence: 98%
“…On the other hand, Jung et al [90] proposed using BLSTM to stack with CNN. The architecture and training scheme remains largely similar as compared to [14].…”
Section: Figure 11 Flowchart Of a Crnnmentioning
confidence: 99%
See 2 more Smart Citations
“…Zhao et al [22] constructed one-dimensional CNN-LSTM network and two-dimensional CNN-LSTM network to learned local and global emotion-related features from speech and log Mel spectrogram. Jung et al [23] proposed a new method to improve the performance of polyphonic sound event detection, which combines convolutional bidirectional recurrent neural network (CBRNN) with transfer learning. Passricha et al [24] proposed a CNN-BiLSTM hybrid structure to extract the spatiotemporal features of speech, which could improve the performance of continuous speech recognition.…”
Section: Related Workmentioning
confidence: 99%