Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2038
|View full text |Cite
|
Sign up to set email alerts
|

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification

Abstract: In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL). Taking into account the structural relationships between acoustic scene classes, our proposed framework captures such relationships which are intrinsically device-independent. In the training stage, transferable knowledge is condensed in NLE from the source domain. Next in the adaptation stage, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…DAT is performed based on these newly generated soft labels. Previous studies found that soft labels correlate with structural relationship among accents [20,21] so that we expect them to encode more detailed accent information. Although one-hot labels are replaced by soft labels, our theoretical equivalence of performing gradient reversal and minimizing JSD still holds, but this JSD is accessed between each utterance distribution.…”
Section: Relabeling With Soft Labelsmentioning
confidence: 91%
“…DAT is performed based on these newly generated soft labels. Previous studies found that soft labels correlate with structural relationship among accents [20,21] so that we expect them to encode more detailed accent information. Although one-hot labels are replaced by soft labels, our theoretical equivalence of performing gradient reversal and minimizing JSD still holds, but this JSD is accessed between each utterance distribution.…”
Section: Relabeling With Soft Labelsmentioning
confidence: 91%
“…The semantic embedding module (b) handles the contexts of not only pre-defined (seen) but also unseen scenes. In this work, to extract the semantic contexts of scenes, we utilize label texts of acoustic scenes, e.g., "city center" or "home," using a procedure similar to that in [21,22]. The label texts of scenes are input to pre-trained language models, then a vector of the semantic embedding e ∈ R E is produced, where the number of dimensions E depends on the pretrained language models.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Mezza et al [10] proposed to project source and target domain features to a lower-dimensional subspace spanned by the eigenvectors of the source domain feature covariance matrix. Hu et al [15] propose to use neural label embedding (NLE) to encode structural relationships between different acoustic scene classes from source domain data. In order to mitigate microphone mismatch, this knowledge is then transferred to target domain data using relational teacher-student learning.…”
Section: Related Workmentioning
confidence: 99%