2021
DOI: 10.1016/j.neucom.2021.06.031
|View full text |Cite
|
Sign up to set email alerts
|

Environment sound classification using an attention-based residual neural network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…To the best our knowledge, global pooling layers are not investigated in the architectures developed for ESC, and simply average pooling is used in most networks [20], [34], [35]. To gain insights into other pooling strategies, here, we review some of the pooling methods proposed in the visual domain.…”
Section: B Feature Pooling Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…To the best our knowledge, global pooling layers are not investigated in the architectures developed for ESC, and simply average pooling is used in most networks [20], [34], [35]. To gain insights into other pooling strategies, here, we review some of the pooling methods proposed in the visual domain.…”
Section: B Feature Pooling Methodsmentioning
confidence: 99%
“…However, the proposed CNN-D model equipped with SSRP-MS/T achieves higher and comparable accuracies without use of multiple input feature channels, attention mechanisms, or recurrent networks, and too many learnable parameters. PiczakCNN [17] 80.5% 64.9% 4 Log Mel-Delta 31.5 M SoundNet [56] 92.1% 74.2% 8 Raw Data 3.2 M EnvNet-v1 [27] 87.2% 70.8% 7 Raw Data 48.0 M EnvNet-v2 [57] 91.4% 84.9% 13 Raw Data 101.2 M DS-CNN [58] 92.6% 83.1% 9 Raw Data-Log Mel 2.3 M Residual Network [35] 87.3% -19 Log Mel 11.7 M Attention-based Residual Network [35] 92.0% -19 Log Mel 11.9 M Multi-Stream CNN [18] 93.7% 83.5% 16 Raw Data-Log Mel -Attention-based CNN-GRU [54] 94.2% 86.5% 11 Log Mel-Delta -Attention-based CNN-GRU [25] 93.7% 86.1% Figure 6 shows the normalized confusion matrix generated by the proposed CNN-D+SSRP-T model for ESC-50 dataset. It is seen that most classes achieve classification accuracy higher than 80%.…”
Section: B Performance Comparison With State-of-the-artmentioning
confidence: 99%
See 2 more Smart Citations
“…Fan et al [ 9 ] use the trunk branch of the residual structure to extract features, and mask branches imitate the attention mechanism to add soft weights to the features extracted from the trunk branch to optimize the extracted features and obtain better performance. Tripathi and Mishra [ 10 ] used four two-layer residual blocks to build the network. After the fourth layer, they used attention modules to deal with intra-class inconsistencies, which improved the compactness and increased by 11.50% and 19.50%, respectively, over the benchmark model on the two datasets.…”
Section: Introductionmentioning
confidence: 99%