Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) 2021
DOI: 10.18653/v1/2021.repl4nlp-1.11
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting Pretraining with Adapters

Abstract: Pretrained language models have served as the backbone for many state-of-the-art NLP results. These models are large and expensive to train. Recent work suggests that continued pretraining on task-specific data is worth the effort as pretraining leads to improved performance on downstream tasks. We explore alternatives to full-scale task-specific pretraining of language models through the use of adapter modules, a parameter-efficient approach to transfer learning. We find that adapter-based pretraining is able… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…TAPT could alternatively be performed by freezing the base model and training the adapters with an MLM objective. Despite being a faster approach, this has been found to sometimes decrease performance (Kim et al, 2021). Adapters Our multilingual system used the Pfeiffer bottleneck adapter configuration, with a reduction factor of 8, which for XLM-R LARGE corresponds to a bottleneck hidden size of 128.…”
Section: Multilingual Systemmentioning
confidence: 99%
“…TAPT could alternatively be performed by freezing the base model and training the adapters with an MLM objective. Despite being a faster approach, this has been found to sometimes decrease performance (Kim et al, 2021). Adapters Our multilingual system used the Pfeiffer bottleneck adapter configuration, with a reduction factor of 8, which for XLM-R LARGE corresponds to a bottleneck hidden size of 128.…”
Section: Multilingual Systemmentioning
confidence: 99%
“…A study by He et al (2021) demonstrated that adapterbased tuning exhibits enhanced stability and generalization capabilities by virtue of being less sensitive to learning rates than traditional fine-tuning methods. While incorporating task adaptation techniques, such as TAPT, has been shown to match or even improve performance over FFT in lowresource setups, Kim et al (2021) noted an interesting caveat: the benefits of integrating TAPT with adapters tend to taper off as the amount of data increases.…”
Section: Related Workmentioning
confidence: 99%
“…Thereby we aim to bring the models closer to the target domains of the task and induce increased task performance compared to the base models. We also apply adapter-based (Houlsby et al, 2019;Pfeiffer et al, 2020) intermediate pretraining to compare full TAPT against to a more parameter efficient approach (Kim et al, 2021).…”
Section: Domain Adaption and Generalizationmentioning
confidence: 99%