ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054628
|View full text |Cite
|
Sign up to set email alerts
|

Polyphonic Sound Event Detection Using Transposed Convolutional Recurrent Neural Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…The RNN is a type of artificial neural network that is especially suitable for processing sequential information such as natural languages or time series data such as videos [ 56 , 57 ]. Applications of RNNs include handwriting recognition [ 58 ], speech recognition [ 59 ], gesture recognition [ 60 ], image captioning [ 61 ], natural language processing [ 62 ] and understanding [ 63 ], sound event prediction [ 64 ], tracking and monitoring [ 65 , 66 , 67 , 68 , 69 ], etc.…”
Section: Deep Learning Models and Methodsmentioning
confidence: 99%
“…The RNN is a type of artificial neural network that is especially suitable for processing sequential information such as natural languages or time series data such as videos [ 56 , 57 ]. Applications of RNNs include handwriting recognition [ 58 ], speech recognition [ 59 ], gesture recognition [ 60 ], image captioning [ 61 ], natural language processing [ 62 ] and understanding [ 63 ], sound event prediction [ 64 ], tracking and monitoring [ 65 , 66 , 67 , 68 , 69 ], etc.…”
Section: Deep Learning Models and Methodsmentioning
confidence: 99%
“…Ding et al [30] proposed a Convolutional Recurrent Neural Network (CRNN) able to identify shortterm dependencies of audio patterns by applying a multiscale detection method. With a similar approach, Chatterjee et al [31] introduced a Transposed Convolutional recurrent Neural Network (TCRNN) that incorporates Mel Instantaneous Frequency spectrogram (mel-IFgram) features. However, recurrent layers in neural networks are subject to the vanishing gradient problem, leading to lack of generalization capabilities.…”
Section: Related Work a Deep Learning Modelsmentioning
confidence: 99%
“…Many works [24], [25], [26] have proven that temporal dependency is significant in detecting sound events. In this research, an RNN is used to add the unique position relationship of sound events by learning the temporal context information.…”
Section: ) Recurrent Layersmentioning
confidence: 99%