2018 IEEE International Symposium on Multimedia (ISM) 2018
DOI: 10.1109/ism.2018.00038
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic Scene Classification Using Reduced MobileNet Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…Convolutional neural networks (CNNs) are widely used in image recognition, speech recognition, and other fields, because they can learn different scales of interrelated features from input data based on mechanisms similar to the human brain. Among all available CNN models, ResNet, EfficientNet, MobileNet, and DenseNet are very representative and have been widely used in sound recognition [25,[41][42][43]; ResNet18, ResNet34, DenseNet_BC_34, MobileNet_v2, and EfficientNet_b3 were selected for performance comparison.…”
Section: Deep Learning Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Convolutional neural networks (CNNs) are widely used in image recognition, speech recognition, and other fields, because they can learn different scales of interrelated features from input data based on mechanisms similar to the human brain. Among all available CNN models, ResNet, EfficientNet, MobileNet, and DenseNet are very representative and have been widely used in sound recognition [25,[41][42][43]; ResNet18, ResNet34, DenseNet_BC_34, MobileNet_v2, and EfficientNet_b3 were selected for performance comparison.…”
Section: Deep Learning Methodsmentioning
confidence: 99%
“…On this basis, we used different deep learning models to learn these acoustic scene samples and compared the classification performance of different models, and we further analyzed the requirements of different models on the amount of training data and the number of training epochs. In terms of models, since deep learning models such as ResNet, DenseNet, MobileNet, and EfficientNet are very representative and have been widely used in the field of sound recognition [25,[40][41][42], we used ResNet18, ResNet34, DenseNet_BC_34, MobileNet_v2, and EfficientNet_b3 to classify the acoustic scene, respectively. The specific contributions and innovations of this paper are summarized as follows: (1) By converting the classification problem of different acoustic scenes into an image recognition problem, this study proposes the DenseNet_BC_34 model to achieve the accurate recognition of seven types of acoustic scene categories; (2) The innovative construction of an acoustic scene dataset for analyzing the correlation between human and animal sounds, containing seven types of acoustic scene data with a total of 7000 samples; (3) We analyzed and compared the classification performance of ResNet18, ResNet34, DenseNet_BC_34, MobileNet_v2, and EfficientNet_b3 models on the proposed acoustic scene categories under different training data amounts and different training epochs and explored the generalization performance of different models to new data.…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, L-mHP is a three-channel feature composed of the Log-Mel spectrogram, harmonic spectrogram, and percussive spectrogram. The harmonic spectrogram and the percussive spectrogram are obtained by harmonic percussive source separation (HPSS) [23] of the Log-Mel spectrogram. Each channel of this feature focuses on different spectral characteristics of sound and are complementary to each other.…”
Section: L-mhp Feature Extraction Schemementioning
confidence: 99%
“…The first group of data augmentation algorithms generates new training data instances from existing ones by applying various signal transformations. Basic audio signal transformation includes time stretching, pitch shifting, dynamic range compression, as well as adding random noise [41][42][43]. Koutini et al applied spectral rolling by randomly shifting spectrogram excerpts over time [44].…”
Section: Data Augmentation Techniquesmentioning
confidence: 99%