ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
DOI: 10.1109/icassp49357.2023.10096211
|View full text |Cite
|
Sign up to set email alerts
|

Simple Pooling Front-Ends for Efficient Audio Classification

Abstract: Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios. Most existing approaches are designed to reduce the size of audio neural networks using methods such as model pruning. In this work, we show that instead of reducing model size using complex methods, eliminating the temporal redundancy in the input audio features (e.g., mel-spectrogram) could be an effective approach for efficient audio classification. To do so, we proposed a family of simple pooli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…Compression of mel-spectrogram temporal dimension can lead to a considerable speed up on training and inference (Liu et al 2023), which has significant promise in on-device scenarios. In this section, we evaluate the effectiveness of DiffRes in compressing temporal dimensions and classification performance.…”
Section: Adaptive Temporal Dimension Compressionmentioning
confidence: 99%
See 2 more Smart Citations
“…Compression of mel-spectrogram temporal dimension can lead to a considerable speed up on training and inference (Liu et al 2023), which has significant promise in on-device scenarios. In this section, we evaluate the effectiveness of DiffRes in compressing temporal dimensions and classification performance.…”
Section: Adaptive Temporal Dimension Compressionmentioning
confidence: 99%
“…There are plenty of studies on learning a suitable frequency resolution with a similar spirit (Stevens, Volkmann, and Newman 1937;Sainath et al 2013;Ravanelli and Bengio 2018b;Zeghidour et al 2021). Most previous studies focus on investigating the effect of different temporal resolutions (Kekre et al 2012;Huzaifah 2017;Ilyashenko et al 2019;Liu et al 2023). Huzaifah (2017) observe the optimal temporal resolution for audio classification is class dependent.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation