2021
DOI: 10.48550/arxiv.2110.07577
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Abstract: Conventional fine-tuning of pre-trained language models tunes all model parameters and stores a full model copy for each downstream task, which has become increasingly infeasible as the model size grows larger. Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when the training data is limited. However, different PELT methods may perform rather differently on the same task, making it no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…We implement our framework in Pytorch and use Tesla V100 gpus for experiments. AdaMix uses adapter dimension size of 16 and 48 using BERT-base and RoBERTalarge encoders respectively, following the setup of existing works Hu et al ( 2021); Mao et al (2021) for a fair comparison. The number of adapters in AdaMix is set to 4 for all the tasks and encoders unless otherwise specified.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We implement our framework in Pytorch and use Tesla V100 gpus for experiments. AdaMix uses adapter dimension size of 16 and 48 using BERT-base and RoBERTalarge encoders respectively, following the setup of existing works Hu et al ( 2021); Mao et al (2021) for a fair comparison. The number of adapters in AdaMix is set to 4 for all the tasks and encoders unless otherwise specified.…”
Section: Methodsmentioning
confidence: 99%
“…The best result on each task is in bold and "-" denotes the missing measure. † and denote that the reported results are taken fromMao et al (2021);Zaken et al (2021). The average performance is calculated based on F1 of QQP and MRPC.…”
mentioning
confidence: 99%
“…Although each of these three approaches has its own focus, the central idea is to keep the pre-trained parameters constant while training lightweight alternatives to achieve adaptation for downstream tasks. There have also been some recent attempts to grasp the internal connection of these strategies and build a unified parameter-efficient tuning framework [333,334].…”
Section: Parameter-efficient Tuningmentioning
confidence: 99%
“…Recently, prompt tuning has been proposed, which will freeze big models and only tune task-specific prompts for downstream tasks [328,571,333]. Based on prompt tuning, we can update and correct the outdated knowledge in the process of continual learning.…”
Section: Continual Learningmentioning
confidence: 99%
“…Moreover, as the ratio of the number of parameters of models with respect to the labeled data increases, the fine-tuning process will be more prone to overfitting (Karimi Mahabadi et al, 2021). There are two categories of solutions: first, model compression (Jafari et al, 2021;Chen et al, 2021); second, parameter-efficient tuning (PET) (Houlsby et al, 2019a;Karimi Mahabadi et al, 2021;Mao et al, 2021).…”
Section: Introductionmentioning
confidence: 99%