2019 IEEE 13th International Conference on Semantic Computing (ICSC) 2019
DOI: 10.1109/icosc.2019.8665547
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(9 citation statements)
references
References 8 publications
0
8
0
1
Order By: Relevance
“…Several extensions to the common CNN architecture were proposed to improve the feature learning. Basbug and Sert adapted the spatial pyramid pooling strategy from computer vision, where feature maps are pooled and combined on different spatial resolutions [59]. In order to learn frequency-aware filters in the convolutional layers, Koutini et al proposed to encode the frequency position of each input feature bin within an additional channel dimension (frequency-aware CNNs) [44].…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…Several extensions to the common CNN architecture were proposed to improve the feature learning. Basbug and Sert adapted the spatial pyramid pooling strategy from computer vision, where feature maps are pooled and combined on different spatial resolutions [59]. In order to learn frequency-aware filters in the convolutional layers, Koutini et al proposed to encode the frequency position of each input feature bin within an additional channel dimension (frequency-aware CNNs) [44].…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…In the process of object detection and recognition, SPP (Basbug and Sert 2019;Grauman and Darrell 2005;Lazebnik et al 2006) has been surprisingly effective. Despite its simplicity, SPP is competitive with methods that use more complex spatial paradigms.…”
Section: Spatial Pyramid Pooling (Spp) Networkmentioning
confidence: 99%
“…Previous studies show that VGGish embeddings achieve good results compared with hand-crafted audio features in audio classification tasks [24,27]. In order to extract audio embedding, we first extract the log Mel spectrograms from audio clips.…”
Section: Audio Embeddingsmentioning
confidence: 99%