Findings of the Association for Computational Linguistics: NAACL 2022 2022
DOI: 10.18653/v1/2022.findings-naacl.199
|View full text |Cite
|
Sign up to set email alerts
|

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Abstract: Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer. Extensive experiments are conducted … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Bitfit (Ben Zaken, Goldberg, andRavfogel 2022) is a competitive method among them, where only the bias term of the model is modified during fine-tuning and other parameters are frozen. Based on this, AdapterBias (Fu et al 2022) is proposed, which assigns different representation shifts to task-related tokens according to the importance of tokens, so as to obtain better fine-tuning effects. Inspired by this advance, we investigate the value of the bias term for domain incremental learning, and introduce the concept of domain bias in our framework to achieve state-of-the-art performance.…”
Section: Bias Tuningmentioning
confidence: 99%
“…Bitfit (Ben Zaken, Goldberg, andRavfogel 2022) is a competitive method among them, where only the bias term of the model is modified during fine-tuning and other parameters are frozen. Based on this, AdapterBias (Fu et al 2022) is proposed, which assigns different representation shifts to task-related tokens according to the importance of tokens, so as to obtain better fine-tuning effects. Inspired by this advance, we investigate the value of the bias term for domain incremental learning, and introduce the concept of domain bias in our framework to achieve state-of-the-art performance.…”
Section: Bias Tuningmentioning
confidence: 99%
“…In our experiments, we did a short pre-training of 10000 step with a learning rate equals to 4e −4 , after that all α j lower than 0.5 were pushed to 0 and the (Devlin et al, 2018); [b] are from (Fu et al, 2022); [c] are from (Glover and Hokamp, 2019); [d] are from (Pilault et al, 2021); † are results obtained with the best checkpoint in our settings.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…More recently, LoRA [12] learns low-rank matrices for parameter updates approximation. AdapterBias [13] adds a token-dependent parameter shift to transfer from PLM in a more parameterefficient manner. Beyond its parameter efficiency, adapter tuning is also shown to be more robust due to its ability to preserve the pre-trained knowledge [14], and often exhibits robustness in out-of-distribution evaluation [5].…”
Section: Adapter Approachmentioning
confidence: 99%
“…AdapterBias [13] adds frame-dependent biases to the representation shifts by using a vector (v) and a linear layer (L α ). v represents the task-specific shift, and L α produces the weights (α) for input frames.…”
Section: Adapterbiasmentioning
confidence: 99%
See 1 more Smart Citation