ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054433
|View full text |Cite
|
Sign up to set email alerts
|

Sound Event Detection Via Dilated Convolutional Recurrent Neural Networks

Abstract: Convolutional recurrent neural networks (CRNNs) have achieved state-of-the-art performance for sound event detection (SED). In this paper, we propose to use a dilated CRNN, namely a CRNN with a dilated convolutional kernel, as the classifier for the task of SED. We investigate the effectiveness of dilation operations which provide a CRNN with expanded receptive fields to capture long temporal context without increasing the amount of CRNN's parameters. Compared to the classifier of the baseline CRNN, the classi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 46 publications
(31 citation statements)
references
References 20 publications
0
27
0
Order By: Relevance
“…It would also be interesting to implement the system on hardware to test its performance under real-life scenarios. Finally, more sophisticated signal-processing deep learning models [26], [27], albeit being more computationally expensive for real-time applications, are also worth being explored as well.…”
Section: Resultsmentioning
confidence: 99%
“…It would also be interesting to implement the system on hardware to test its performance under real-life scenarios. Finally, more sophisticated signal-processing deep learning models [26], [27], albeit being more computationally expensive for real-time applications, are also worth being explored as well.…”
Section: Resultsmentioning
confidence: 99%
“…The kernel dilation could be used in any combination (for example, dilation in time dimension or feature dimension only) or all combinations of its dimensions. Li et al provided a method to combine dilated convolution with RNN in audio classification task [ 36 ], which clearly focused on the exploration and learning of long-term patterns. Drossos et al proposed an improved Convolutional Recursive Neural Network (CRNN) structure [ 31 ] which used DWS and dilated convolution with dilation in the time dimension only, i.e., time-dilated convolution.…”
Section: Related Workmentioning
confidence: 99%
“…1D CNN was chosen as a shallow benchmark learner as it enables frame-level investigation, and its use had been explored for audio recognition and Natural Language Processing (NLP). 1D CNN has been used with raw waveform and usually combined with a Recurrent Neural Network (RNN) in audio applications [ 75 ]. The convolution layer’s kernel size in our benchmark 1D CNN is set to 3, and 24 filters were used with a ReLU activation.…”
Section: Performance Comparisonmentioning
confidence: 99%