Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.86
|View full text |Cite
|
Sign up to set email alerts
|

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Abstract: Task-adaptive pre-training (TAPT) and Selftraining (ST) have emerged as the major semisupervised approaches to improve natural language understanding (NLU) tasks with massive amount of unlabeled data. However, it's unclear whether they learn similar representations or they can be effectively combined. In this paper, we show that TAPT and ST can be complementary with simple TFS protocol by following TAPT → Finetuning → Selftraining (TFS) process. Experimental results show that TFS protocol can effectively utili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…The pseudo-labeled data are either used for self-training (Schick and Schütze, 2021) or consistency training (Xie et al, 2020). All these approaches can also be combined to enable more efficient use of unlabeled data (Li et al, 2021b;Chen et al, 2021;Zhao and Yao, 2022). In our experiments, we found that simple self-training using the same data for DAPT can further improve the performance of AdaSent.…”
Section: Semi-supervised Text Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…The pseudo-labeled data are either used for self-training (Schick and Schütze, 2021) or consistency training (Xie et al, 2020). All these approaches can also be combined to enable more efficient use of unlabeled data (Li et al, 2021b;Chen et al, 2021;Zhao and Yao, 2022). In our experiments, we found that simple self-training using the same data for DAPT can further improve the performance of AdaSent.…”
Section: Semi-supervised Text Classificationmentioning
confidence: 99%
“…Besides DAPT, another major way to utilize the unlabeled data is self-training, which has been shown to be complementary to DAPT (Li et al, 2021b).…”
Section: Combining Dapt and Self-trainingmentioning
confidence: 99%
“…Then, individual sentence descriptions are generated separately for each group, and combined to form the final description for the input. In contrast to previous studies that primarily rely on self-training to improve CG in DTG (He et al, 2020;Heidari et al, 2021;Li et al, 2021;Mehta et al, 2022), as well as using data augmentation to improve CG in various tasks such as semantic parsing (Andreas, 2020;Qiu et al, 2022;Fang et al, 2023), our method solely relies on small training sets and does not require any additional human-annotated, automatically labeled, or unlabeled data. In the CG-centric testing scenario, we observe significant improvements across all metrics in the benchmark compared to the vanilla T5 model (Raffel et al, 2020;Kale and Rastogi, 2020b).…”
Section: Introductionmentioning
confidence: 98%
“…Unlike common teacherstudent training for knowledge distillation (Hinton et al, 2015;Gou et al, 2021), here the teacher does not train the student on all the available instances (in our case, all the incoming customer inputs). Also, unlike teacher-student approaches to self-training (Mi et al, 2021;Li et al, 2021), the teacher is already reasonably effective (but expensive). In that sense, our work is closer to ac-tive learning (Settles, 2012;Monarch, 2021), but OCaTS trains the student on labels provided by a teacher LLM, not humans, and there is initially no large pool of unlabeled instances (customer inputs) to select from, as instances arrive online.…”
Section: Introductionmentioning
confidence: 99%