2022
DOI: 10.48550/arxiv.2205.12410
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models

Abstract: Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. Parameter-efficient techniques have been developed that tune small trainable components (e.g., adapters) injected in the large model while keeping most of the model weights frozen. The prevalent mechanism to increase adapter ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…The adapters are taught how to pick up knowledge appropriate to a given task. PEFT of pre-trained language models has recently demonstrated remarkable results, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters (Fu et al, 2023;Liu et al, 2022;Wang et al, 2022), thereby addressing storage and communication constraints. Such approaches include prefix-tuning (Li and Liang, 2021), prompt-tuning (Hu et al, 2021b), soft-prompting (Lester et al, 2021) and LoRa (Hu et al, 2021a).…”
Section: Introductionmentioning
confidence: 99%
“…The adapters are taught how to pick up knowledge appropriate to a given task. PEFT of pre-trained language models has recently demonstrated remarkable results, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters (Fu et al, 2023;Liu et al, 2022;Wang et al, 2022), thereby addressing storage and communication constraints. Such approaches include prefix-tuning (Li and Liang, 2021), prompt-tuning (Hu et al, 2021b), soft-prompting (Lester et al, 2021) and LoRa (Hu et al, 2021a).…”
Section: Introductionmentioning
confidence: 99%
“…Based on this, other improvements to the method have been proposed. For instance, inspired by previous work on mixture-of-experts (MoE), AdaMix [160] uses multiple experts in each adapter layer. Additional techniques such as random expert selection and consistency regularization are employed in order to reduce computational cost and stabilize training.…”
Section: Additive Methodsmentioning
confidence: 99%