2022
DOI: 10.1007/s10115-022-01736-y
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge distillation for BERT unsupervised domain adaptation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 12 publications
0
14
0
Order By: Relevance
“…Given training data from a labeled source domain D train S and an unlabeled target domain D train T , our approach for DA in hate-speech involves 2 steps: (i) extraction of source-specific terms and (ii) reducing the importance of these terms. Our setting is similar to Ben-David et al (2020) and Ryu and Lee (2020).…”
Section: Proposed Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…Given training data from a labeled source domain D train S and an unlabeled target domain D train T , our approach for DA in hate-speech involves 2 steps: (i) extraction of source-specific terms and (ii) reducing the importance of these terms. Our setting is similar to Ben-David et al (2020) and Ryu and Lee (2020).…”
Section: Proposed Approachmentioning
confidence: 99%
“…Our method, through penalization of these terms, automatically enforces the source domain classifier to focus on domain-invariant content. Compared to approaches transforming highdimensional intermediate representations to reduce the domain discrepancy, such as domain adversarial learning (Ryu and Lee, 2020;Tzeng et al, 2017), our approach makes the adaptation more explainable, while improving the overall cross-domain performance compared to prior-approaches.…”
Section: Introductionmentioning
confidence: 99%
“…Studies in [22], [47] found that a catastrophic forgetting problem occurs when the ADDA framework is applied to the BERT model. [47] propose a look-ahead optimization strategy to accommodate the adversarial domain discrimination loss and the task-specific classification loss when optimizing BERT representations.…”
Section: B Adversarial Learningmentioning
confidence: 99%
“…[47] propose a look-ahead optimization strategy to accommodate the adversarial domain discrimination loss and the task-specific classification loss when optimizing BERT representations. [22] propose to use knowledge distillation [119] to distill knowledge from source encoder to target encoder, thereby regularizing ADDA for unsupervised domain adaptation of BERT. Adversarial Robustness and Consistency Training.…”
Section: B Adversarial Learningmentioning
confidence: 99%
See 1 more Smart Citation