2020
DOI: 10.1109/access.2020.3021711
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Distillation in Acoustic Scene Classification

Abstract: Common acoustic properties that different classes share degrades the performance of acoustic scene classification systems. This results in a phenomenon where a few confusing pairs of acoustic scenes dominate a significant proportion of all misclassified audio segments. In this paper, we propose adopting a knowledge distillation framework that trains deep neural networks using soft labels. Soft labels, extracted from another pre-trained deep neural network, are used to reflect the similarity between different c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1
1

Relationship

2
8

Authors

Journals

citations
Cited by 30 publications
(17 citation statements)
references
References 20 publications
0
17
0
Order By: Relevance
“…Previous studies have already shown that deep convolutional neural networks trained to classify words, musical genres (Kell et al, 2018;Kumar et al, 2020) or natural sounds (Koumura et al, 2019), generate brain-like representations, in the sense that one can find a linear correspondence between the activation of the neural networks and the activations of the brain (Figure 1). This similarity can be quantified with a "brain score" (Jung et al, 2019), a correlation between the brain measurements and a linear projection of the model's activations, under the assumption that representations are defined as linearly exploitable information (Hung et al, 2005;Kamitani & Tong, 2005;Kriegeskorte & Kievit, 2013;King et al, 2018). We thus hypothesize that the nature of speech representations in the brain can be elucidated by comparing them to those of random, sound-generic and speech-specific neural networks (Figure 1).…”
Section: Introductionmentioning
confidence: 99%
“…Previous studies have already shown that deep convolutional neural networks trained to classify words, musical genres (Kell et al, 2018;Kumar et al, 2020) or natural sounds (Koumura et al, 2019), generate brain-like representations, in the sense that one can find a linear correspondence between the activation of the neural networks and the activations of the brain (Figure 1). This similarity can be quantified with a "brain score" (Jung et al, 2019), a correlation between the brain measurements and a linear projection of the model's activations, under the assumption that representations are defined as linearly exploitable information (Hung et al, 2005;Kamitani & Tong, 2005;Kriegeskorte & Kievit, 2013;King et al, 2018). We thus hypothesize that the nature of speech representations in the brain can be elucidated by comparing them to those of random, sound-generic and speech-specific neural networks (Figure 1).…”
Section: Introductionmentioning
confidence: 99%
“…ASC is a multiclass classification task that identifies a segment as being one of predefined scenes (i.e., classes). Acoustic scenes have an abstract (i.e., ambiguous) definition, and thus various characteristics may coincide across different scenes [20,21]. For example, 'airport' and 'shopping mall', which are both predefined scenes in the DCASE ASC challenge, may contain people talking and the acoustic properties of large indoor spaces.…”
Section: Related Workmentioning
confidence: 99%
“…Network pruning reduces the size of the network by deleting some unnecessary branches [33], [34], but it still requires training a large network and designing methods to evaluate the importance of branches. Knowledge distillation requires a large teacher network to guide smaller distilled network training [35], [36]. It also needs to calculate the gap between the two network feature maps, which requires more complex techniques.…”
Section: B Model Compressionmentioning
confidence: 99%