ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414835
|View full text |Cite
|
Sign up to set email alerts
|

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

Abstract: To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finergrained classes. Three different CNN architecture… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(7 citation statements)
references
References 23 publications
0
7
0
Order By: Relevance
“…Our baseline model was a smaller version of the ResNet model from [24]. The key elements of the Resnet structure are the Residual Blocks.…”
Section: Resnet Baseline Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Our baseline model was a smaller version of the ResNet model from [24]. The key elements of the Resnet structure are the Residual Blocks.…”
Section: Resnet Baseline Modelmentioning
confidence: 99%
“…For the DCASE 2020 task 1 dataset, we extend the baseline network from the DCASE 2020 [8] to work with binaural audio as our baseline for comparison. Meanwhile, in the DCASE 2021 dataset, as our baseline we selected the much more complex Residual network solution [24] that had a high performance on the dataset. To make the trade-off clear we limited the architecture changes, and our solutions are mainly achieved by replacing 2D-convolution operations in the baseline networks with our proposed decomposition.…”
Section: Introductionmentioning
confidence: 99%
“…The primary goal has been to improve generalization on the underrepresented devices. Supervised machine learning algorithms have been proposed to account for the data imbalance problem and are often combined with data augmentation, regularization and fine tuning approaches [5][6][7]. As the dataset contains recordings captured This work was made with the support of the French National Research Agency, in the framework of the project LEAUDS "Learning to understand audio scenes" (ANR-18-CE23-0020).…”
Section: Introductionmentioning
confidence: 99%
“…Focusing on the frequency normalization, authors in [20] proposed a novel Residual Normalization method and a residual-based network architecture, which showed effective to improve the ASC performance and achieved the top-1 on DCASE 2021 Task 1A blind Test set and the top-4 on DCASE 2021 Task 1A Development set. However, to achieve the best performance, some papers from the second approach have still applied ensemble methods of multiple models [21], [22], [23], [24], which increases the model complexity.…”
Section: Introductionmentioning
confidence: 99%
“…To deal with the issue of large footprint models as using complex network architectures, ensemble of multiple models, or ensemble of multiple spectrogram inputs, pruning [21], [25], [22], [26] and quantization [25], [23] techniques have been widely applied. While quantization techniques feasibly help the model reduce to 1/4 of the original size (i.e, 32 bit with floating point format presenting for 1 trainable parameter is quantized to 8 bit with integer format [27]), pruning techniques prove that models can be reduced to 1/10 of the original sizes [25], [26].…”
Section: Introductionmentioning
confidence: 99%