Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.309
|View full text |Cite
|
Sign up to set email alerts
|

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

Abstract: In Natural Language Processing (NLP), finding data augmentation techniques that can produce high-quality human-interpretable examples has always been challenging. Recently, leveraging kNN such that augmented examples are retrieved from large repositories of unlabelled sentences has made a step toward interpretable augmentation. Inspired by this paradigm, we introduce MiniMax-kNN, a sample efficient data augmentation strategy tailored for Knowledge Distillation (KD). We exploit a semi-supervised approach based … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
7

Relationship

5
2

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 26 publications
0
8
0
Order By: Relevance
“…Knowledge distillation (Hinton et al, 2015;Buciluǎ et al, 2006;Gao et al, 2018;Kamalloo et al, 2021;Rashid et al, 2020) has emerged as an important algorithm in language model compression (Jiao et al, 2020;Sanh et al, 2020;.…”
Section: Knowledge Distillation (Kd)mentioning
confidence: 99%
See 1 more Smart Citation
“…Knowledge distillation (Hinton et al, 2015;Buciluǎ et al, 2006;Gao et al, 2018;Kamalloo et al, 2021;Rashid et al, 2020) has emerged as an important algorithm in language model compression (Jiao et al, 2020;Sanh et al, 2020;.…”
Section: Knowledge Distillation (Kd)mentioning
confidence: 99%
“…2) Employing data-augmentation (Jiao et al, 2020;Fu et al, 2020;Rashid et al, 2021;Kamalloo et al, 2021) to improve KD by using more diverse training data. It is difficult to compare these methods since, typically, the teachers and students are initialized differently.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, Knowledge Distillation for BERTlike models (Devlin et al, 2019;Liu et al, 2019) has been extensively studied, leveraging intermediate layer matching (Ji et al, 2021;, data augmentation (Fu et al, 2020;Jiao et al, 2020;Kamalloo et al, 2021), adversarial training (Zaharia et al, 2021;Rashid et al, 2020, and lately loss terms re-weighting (Clark et al, 2019;Zhou et al, 2021;Jafari et al, 2021). In this work, we explore the latter direction with a meta learning approach (Li et al, 2019;Fan et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Jiao et al (2019) show a two-stage KD with intermediate layer mapping, attention distillation and embedding distillation for BERTbased models. Mate-KD (Rashid et al, 2021) and MiniMax-KNN KD (Kamalloo et al, 2021) tailor data augmentation for KD, in which augmented samples are generated or selected based on maximum divergence loss between the student and teacher networks. Rashid et al (2020) propose a zero-shot KD technique in NLP in which the student does not need to access the teacher training data for its training.…”
Section: Introductionmentioning
confidence: 99%