2020 International Joint Conference on Neural Networks (IJCNN) 2020
DOI: 10.1109/ijcnn48605.2020.9207532
|View full text |Cite
|
Sign up to set email alerts
|

Sound Event Detection with Depthwise Separable and Dilated Convolutions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
48
0
5

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 32 publications
(60 citation statements)
references
References 20 publications
0
48
0
5
Order By: Relevance
“…In a paper previously published on audio classification, CNN was replaced by depthwise separable (DWS) convolution [ 29 , 30 ]. DWS was a decomposition form of standard convolution, which decomposed a standard convolution into one convolution and one 1 × 1 convolution (called pointwise convolution) [ 31 ].…”
Section: Related Workmentioning
confidence: 99%
“…In a paper previously published on audio classification, CNN was replaced by depthwise separable (DWS) convolution [ 29 , 30 ]. DWS was a decomposition form of standard convolution, which decomposed a standard convolution into one convolution and one 1 × 1 convolution (called pointwise convolution) [ 31 ].…”
Section: Related Workmentioning
confidence: 99%
“…To have an idea of the effect of depthwise convolutions, it is possible to compare one of the early CNNs such as AlexNet [22] with YAMNet. The number of parameters contained in AlexNet amounts to 61 million in 8 layers, whereas YAMNet contains 3.7 million parameters [41] in 30 layers. Fewer learnable weights further reduce overfitting risks in such architectures.…”
Section: Yamnet: An Efficient Cnn For Sound Event Detectionmentioning
confidence: 99%
“…However, no previous studies have investigated knowledge transfer from sound recognition tasks to machine diagnosis. The present work is motivated by the idea that networks pre-trained on audios for sound event detection (SED) [40][41][42] may encapsulate the necessary knowledge to classify REB spectrograms. The main goal of SED is to identify instances of sound events in audio recordings [40,43].…”
Section: Introductionmentioning
confidence: 99%
“…The aforementioned observation was proven by several researchers [3][4][5][6][7][8][9][10][11][12][13][14]. DL is beneficial in other fields, including target recognition [15], speech recognition [16,17], image recognition [18][19][20], image restoration [21][22][23], audio classification [24,25], object detection [26][27][28][29][30], scene recognition [31], etc., but it has been considered "bad news" in text-based CAPTCHAs, by penetrating their security and making them vulnerable.…”
Section: Introductionmentioning
confidence: 98%