2021
DOI: 10.1016/j.apacoust.2021.108258
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic scene classification based on Mel spectrogram decomposition and model merging

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 40 publications
(14 citation statements)
references
References 8 publications
0
14
0
Order By: Relevance
“…The mel spectrogram [ 36 ] is a combination of the mel scale and a spectrogram. The acquisition process is simple; the input signal is preprocessed (framing, windowing), fast Fourier transform (FFT) is applied and the signal is passed through a mel filter bank.…”
Section: Methodologiesmentioning
confidence: 99%
“…The mel spectrogram [ 36 ] is a combination of the mel scale and a spectrogram. The acquisition process is simple; the input signal is preprocessed (framing, windowing), fast Fourier transform (FFT) is applied and the signal is passed through a mel filter bank.…”
Section: Methodologiesmentioning
confidence: 99%
“…A Mel spectrogram translates acoustic signals into a visual representation that depicts sound intensity in Mel-frequency bands over time, and it is widely used in audio data processing. In essence, in other words, with the Mel spectrogram transforms applied, SED is shifted from a problem of distinguishing the inherent acoustic characteristics in sound to a challenge of distinguishing the visual features of the Mel spectrogram [ 16 , 17 , 18 ].…”
Section: Methodsmentioning
confidence: 99%
“…The model obtained an accuracy of 87.88 %, and the classification results for COVID-19 and pneumonia were visually confirmed using Grad-CAM. Zhang et al [34] converted the voice data of the 2019 TAU Urban Acoustic Scenes into a Mel spectrogram, classified the voice using ResNet20, and applied Grad-CAM to interpret the results. CAM was applied and analyzed to the Mel spectrogram and MFCC, which converts 1D-based sound into a 2D image.…”
Section: Related Workmentioning
confidence: 99%