2021
DOI: 10.48550/arxiv.2112.08348
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

Abstract: Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous prompts that is faithful to the problem they solve. In practice, we observe a "wayward" behavior between the task solved by continuous prompts and their nearest neighbor discrete projections: We can find continuous prompts that solve a task while being projecte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 10 publications
0
12
0
Order By: Relevance
“…Last, our interpretation of FFN outputs as updates to the output distribution relates to recent works that interpreted groups of LM parameters in the discrete vocabulary space (Geva et al, 2021;Khashabi et al, 2021), or viewed the representation as an information stream (Elhage et al, 2021).…”
Section: Related Workmentioning
confidence: 90%
“…Last, our interpretation of FFN outputs as updates to the output distribution relates to recent works that interpreted groups of LM parameters in the discrete vocabulary space (Geva et al, 2021;Khashabi et al, 2021), or viewed the representation as an information stream (Elhage et al, 2021).…”
Section: Related Workmentioning
confidence: 90%
“…Given optimization difficulties when training prompt embeddings, Diao et al [57] recently used black-box optimization to train prompt embeddings without requiring gradients. Several works have analyzed prompt tuning from the perspective of interpretability Khashabi et al [58] and its similarity to other PEFT methods He et al [29]. Prompt tuning has been applied to various applications for NLP including continual learning [59], model robustness [60,61], summarization [62], machine translation [63], co-training [64], probing language models [65,65], inverse prompting [66], and transfer learning [67].…”
Section: Related Workmentioning
confidence: 99%
“…Recent work (Brown et al, 2020;Jiang et al, 2020;Khashabi et al, 2021;Gao et al, 2021) shows it's possible to combine discrete text prompt z with input x to directly perform various NLP tasks using a pre-trained LM's generative distribution P LM (y|z, x), without needing to fine-tune the model. For instance, in classification, the LM can be a masked language model (MLM) such as BERT (Devlin et al, 2019), and y is the class-label token (a.k.a.…”
Section: The Discrete Prompt Optimization Problemmentioning
confidence: 99%
“…One of the most popular schemes of prompt optimization is to tune soft prompts (i.e., continuous embedding vectors) as they are amenable to gradient descent Li and Liang, 2021;Vu et al, 2021;Gu et al, 2021;Liu et al, 2021d;Mokady et al, 2021;An et al, 2022, etc.). However, the resulting continuous embedding learned with an LM is, by its nature, hard for humans to understand (Khashabi et al, 2021;Hambardzumyan et al, 2021;Mokady et al, 2021) and incompatible for use with other LMs. Besides, the required LM internal gradients are often expensive to compute, or simply unavailable for LMs deployed with only inference APIs (e.g., .…”
Section: Introductionmentioning
confidence: 99%