2021
DOI: 10.48550/arxiv.2110.07602
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Abstract: Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pretrained models. We also find that existing methods of prompt tuning cannot handle hard sequence tagging tasks, indicating a lack of universality. We present a novel empirical finding that properly optimized prompt tuning can be universally effective acr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
91
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 64 publications
(93 citation statements)
references
References 42 publications
1
91
1
Order By: Relevance
“…We have showcased how to attach them to a single MSA layer in Section 4.1. Most existing prompt-related work simply place prompts only at the first MSA [60,24], or at every MSA layer [25,29]. However, we argue that it is crucial to explore where and how to attach both types of prompts, under the continual visual learning setting.…”
Section: Prompt Attaching: Where and How?mentioning
confidence: 99%
“…We have showcased how to attach them to a single MSA layer in Section 4.1. Most existing prompt-related work simply place prompts only at the first MSA [60,24], or at every MSA layer [25,29]. However, we argue that it is crucial to explore where and how to attach both types of prompts, under the continual visual learning setting.…”
Section: Prompt Attaching: Where and How?mentioning
confidence: 99%
“…For a given task, a fixed number of continuous token embeddings is optimized when concatenated to the input embeddings of each training example (illustrated in Figure 1a). When trained on a single dataset (and when given access to a largeenough model), prompt tuning has been shown to yield performance competitive with fine tuning Liu et al, 2021). This is an inspiring finding, since the fraction of parameters trained during prompt tuning is tiny relative to full model size (∼ 0.001%).…”
Section: The Id-pt Architecturementioning
confidence: 94%
“…Currently, prompt tuning is one of the most parameter-efficient methods for large language models [45,14,53]. Liu et al [54] introduce several tricks to improve prompt tuning, An et al [55] tune prompts along with input embeddings for boost in performance, and Chen et al [56] improve prompt embeddings through continued pre-training. Given optimization difficulties when training prompt embeddings, Diao et al [57] recently used black-box optimization to train prompt embeddings without requiring gradients.…”
Section: Related Workmentioning
confidence: 99%