2020
DOI: 10.48550/arxiv.2010.11918
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AdapterDrop: On the Efficiency of Adapters in Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(16 citation statements)
references
References 0 publications
0
16
0
Order By: Relevance
“…More on Fine-tuning: There exist other parameter-efficient tuning methods which we did not evaluate in our work. Some of these include random subspace projection (exploiting intrinsic dimensionality Aghajanyan et al, 2020)), prefix and prompt tuning Lester et al, 2021), tuning only biases (Cai et al, 2020;Ben Zaken et al, 2021), and other architecture variants including Adapters (Pfeiffer et al, 2021;Rücklé et al, 2020). An interesting direction for future work is to see whether parameter-efficient tuning approaches specifically designed for the private setting can achieve higher utility.…”
Section: Related Workmentioning
confidence: 99%
“…More on Fine-tuning: There exist other parameter-efficient tuning methods which we did not evaluate in our work. Some of these include random subspace projection (exploiting intrinsic dimensionality Aghajanyan et al, 2020)), prefix and prompt tuning Lester et al, 2021), tuning only biases (Cai et al, 2020;Ben Zaken et al, 2021), and other architecture variants including Adapters (Pfeiffer et al, 2021;Rücklé et al, 2020). An interesting direction for future work is to see whether parameter-efficient tuning approaches specifically designed for the private setting can achieve higher utility.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, one is to train a subset of the model parameters, where the most common approach is to use a linear probe on top of pretrained features [12]. The other alternative method surfaces by including new parameters in between the network [15,14,6,7,37,38]. Nevertheless, two problems arise when adopting these methods for fine-tuning Vision Transformers.…”
Section: Efficient Fine-tuning In Nlpmentioning
confidence: 99%
“…• AdapterDrop [14]: Adapterdrop is an extension of Adapter-tuning methods where it drops Adapters from lower Transformer layers during training and inference. In our experiments, we dropped Adapters from all layers except for the last layer in ViT.…”
Section: Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…The rationales behind adapters Why adapter is able to achieve comparable accuracy with much fewer parameters than freezing the bottom transformer layers without revising the model structure? We reason it with two insights from our experiments and related literature [54,63].…”
Section: Design 31 Plugable Adaptersmentioning
confidence: 99%