2023
DOI: 10.48550/arxiv.2301.12132
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 0 publications
1
4
0
Order By: Relevance
“…However, as we will show with an analysis of the search spaces of discrete prompts, such a practice is actually suboptimal and has made the optimization unnecessarily difficult. Similar to the phenomenon observed in related discrete optimization problems such as neural architecture search Ru et al, 2020;Zhou et al, 2023b), we find the influence exerted by different tokens on the LLM when prepended to the text queries as discrete prompts to be highly non-uniform, with a small number of tokens (e.g., 0.1 -1% of all tokens) exerting a disproportionate amount of influence. Meanwhile, the models are insensitive to or even harmed by the vast majority of the other, 'non-influential' tokens, which nevertheless act as nuisance variables during the search to substantially increase the optimization difficulty and resources required.…”
Section: Introductionsupporting
confidence: 83%
“…However, as we will show with an analysis of the search spaces of discrete prompts, such a practice is actually suboptimal and has made the optimization unnecessarily difficult. Similar to the phenomenon observed in related discrete optimization problems such as neural architecture search Ru et al, 2020;Zhou et al, 2023b), we find the influence exerted by different tokens on the LLM when prepended to the text queries as discrete prompts to be highly non-uniform, with a small number of tokens (e.g., 0.1 -1% of all tokens) exerting a disproportionate amount of influence. Meanwhile, the models are insensitive to or even harmed by the vast majority of the other, 'non-influential' tokens, which nevertheless act as nuisance variables during the search to substantially increase the optimization difficulty and resources required.…”
Section: Introductionsupporting
confidence: 83%
“…The first and second best results are in bold and underlined, respectively. All baseline results for BERT base and RoBERTa large are from [47] and [16], respectively. We report Spearman's Correlation for STS-B and matched accuracy for MNLI on BERT base .…”
Section: Resultsmentioning
confidence: 99%
“…Houlsby Adapter (Adapter H ) [13], Pfeiffer Adapter (Adapter P ) [15], Prefix-Tuning [14] and LoRA [16] are chosen as PEFT baselines. In addition, two unified PEFT methods, MAM [17] and AutoPEFT [47], that combine multiple PEFT methods are also chosen as PEFT baselines. Lastly, two feature-based tuning methods, Y-Tuning [19] and LST [20], that aim to reduce training memory serve as memory-efficient baselines.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Another line of PET method selects the existing parameters in a PLM (Ben Zaken et al, 2022;Guo et al, 2021) as the tunable module to optimize. To further enhance the performance of PET methods, some works propose automatic selection strategies (Hu et al, 2022c;Lawton et al, 2023;Zhou et al, 2023) for tunable parameters.…”
Section: Related Workmentioning
confidence: 99%