AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

Han, Zhiwu; Wan, Xingchen; Vulić, Ivan; Korhonen, Anna

doi:10.48550/arxiv.2301.12132

Cited by 3 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, as we will show with an analysis of the search spaces of discrete prompts, such a practice is actually suboptimal and has made the optimization unnecessarily difficult. Similar to the phenomenon observed in related discrete optimization problems such as neural architecture search Ru et al, 2020;Zhou et al, 2023b), we find the influence exerted by different tokens on the LLM when prepended to the text queries as discrete prompts to be highly non-uniform, with a small number of tokens (e.g., 0.1 -1% of all tokens) exerting a disproportionate amount of influence. Meanwhile, the models are insensitive to or even harmed by the vast majority of the other, 'non-influential' tokens, which nevertheless act as nuisance variables during the search to substantially increase the optimization difficulty and resources required.…”

Section: Introductionsupporting

confidence: 83%

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

Zhou,

Wan,

Vulić

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-aservice usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (CLAPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, CLAPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of blackbox prompt-based learning.

show abstract

Section: Introductionsupporting

confidence: 83%

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

Zhou,

Wan,

Vulić

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…The first and second best results are in bold and underlined, respectively. All baseline results for BERT base and RoBERTa large are from [47] and [16], respectively. We report Spearman's Correlation for STS-B and matched accuracy for MNLI on BERT base .…”

Section: Resultsmentioning

confidence: 99%

“…Houlsby Adapter (Adapter H ) [13], Pfeiffer Adapter (Adapter P ) [15], Prefix-Tuning [14] and LoRA [16] are chosen as PEFT baselines. In addition, two unified PEFT methods, MAM [17] and AutoPEFT [47], that combine multiple PEFT methods are also chosen as PEFT baselines. Lastly, two feature-based tuning methods, Y-Tuning [19] and LST [20], that aim to reduce training memory serve as memory-efficient baselines.…”

Section: Methodsmentioning

confidence: 99%

“…Recently, some unified methods, which combine multiple PEFT methods in a heuristic way [17] or with the technique of neural architecture search [47,62,63], have also been proposed. Though PEFTs save the storage by a large margin compared to full fine-tuning, they still require a similar memory footprint during training as full fine-tuning [19,20].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Debiasing Pre-Trained Language Models via Efficient Fine-Tuning

Michael¹,

Zhang²,

Lee³

2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

An explosion in the popularity of transformerbased language models (such as GPT-3, BERT, RoBERTa, and ALBERT) has opened the doors to new machine learning applications involving language modeling, text generation, and more. However, recent scrutiny reveals that these language models contain inherent biases towards certain demographics reflected in their training data. While research has tried mitigating this problem, existing approaches either fail to remove the bias completely, degrade performance ("catastrophic forgetting"), or are costly to execute. This work examines how to reduce gender bias in a GPT-2 language model by fine-tuning less than 1% of its parameters. Through quantitative benchmarks, we show that this is a viable way to reduce prejudice in pre-trained language models while remaining cost-effective at scale.

show abstract

“…Another line of PET method selects the existing parameters in a PLM (Ben Zaken et al, 2022;Guo et al, 2021) as the tunable module to optimize. To further enhance the performance of PET methods, some works propose automatic selection strategies (Hu et al, 2022c;Lawton et al, 2023;Zhou et al, 2023) for tunable parameters.…”

Section: Related Workmentioning

confidence: 99%

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

Su,

Chan,

Cheng

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Parameter-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) by training only minimal parameters. Different PET methods utilize different manually designed tunable modules. In small PLMs, there are usually noticeable performance differences among PET methods. Nevertheless, as the model scale increases, the performance differences become marginal. Hence, we hypothesize that model scaling mitigates the impact of design differences on PET methods. To investigate this hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET) method. The APET method is compatible with a tunable module, which consists of any number of parameters distributed in arbitrary positions. Then, we utilize it and conduct experiments on 11 NLP tasks across 3 representative PLMs. Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters. Intriguingly, we also observe that tuning methods optimize the similar number of tunable parameters to exceed random guess performance on different tasks. We collectively discuss this phenomenon and the two aforementioned findings from an optimization perspective to understand the underlying mechanisms. These conclusions enhance our understanding of the impact of model scaling on PET and assist in designing more effective and efficient PET methods for PLMs of different scales. The source code can be obtained from this GitHub repository: https:// github.com/yushengsu-thu/PET_Scaling.

show abstract

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

Cited by 3 publications

References 0 publications

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

Debiasing Pre-Trained Language Models via Efficient Fine-Tuning

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

Contact Info

Product

Resources

About