2021
DOI: 10.48550/arxiv.2102.01017
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Measuring and Improving Consistency in Pretrained Language Models

Abstract: Consistency of a model -that is, the invariance of its behavior under meaning-preserving alternations in its input -is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create PARAREL , a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for thirty-eight relations. Using PARAREL , we show that the consistency of a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 64 publications
0
9
0
Order By: Relevance
“…Instruction-based learning has also been used in few-shot settings; popular variants include in-context learning, where the model's parameters are fixed and examples are provided as additional context (Brown et al, 2020;Lu et al, 2021;Kumar and Talukdar, 2021;Min et al, 2021), finetuning the entire model (Schick and Schütze, 2021a,c;Gao et al, 2021;Tam et al, 2021), and prompt tuning, where only the instruction itself is optimized (Shin et al, 2020;Hambardzumyan et al, 2021;Li and Liang, 2021;. Several works investigating the limitations and drawbacks of instruction-based few-shot approaches find that current LMs are mostly unable to understand complex instructions that go beyond short prompts or simple questions (Efrat and Levy, 2020;Weller et al, 2020;Webson and Pavlick, 2021) and that they are highly sensitive to the exact wording of the instructions provided (Jiang et al, 2020;Schick and Schütze, 2021a;Elazar et al, 2021). In a similar vein, Perez et al (2021) andLogan IV et al (2021) argue that prior work overestimates few-shot performance as manual prompt tuning is required to achieve good performance.…”
Section: Related Workmentioning
confidence: 99%
“…Instruction-based learning has also been used in few-shot settings; popular variants include in-context learning, where the model's parameters are fixed and examples are provided as additional context (Brown et al, 2020;Lu et al, 2021;Kumar and Talukdar, 2021;Min et al, 2021), finetuning the entire model (Schick and Schütze, 2021a,c;Gao et al, 2021;Tam et al, 2021), and prompt tuning, where only the instruction itself is optimized (Shin et al, 2020;Hambardzumyan et al, 2021;Li and Liang, 2021;. Several works investigating the limitations and drawbacks of instruction-based few-shot approaches find that current LMs are mostly unable to understand complex instructions that go beyond short prompts or simple questions (Efrat and Levy, 2020;Weller et al, 2020;Webson and Pavlick, 2021) and that they are highly sensitive to the exact wording of the instructions provided (Jiang et al, 2020;Schick and Schütze, 2021a;Elazar et al, 2021). In a similar vein, Perez et al (2021) andLogan IV et al (2021) argue that prior work overestimates few-shot performance as manual prompt tuning is required to achieve good performance.…”
Section: Related Workmentioning
confidence: 99%
“…Subsequent work argues that this approach of using a single natural language prompt per relation underestimates the ability of language models to predict factual information given language models are very sensitive to the input prompts. To this end, Jiang et al (2020) and Elazar et al (2021) generate paraphrases of prompts in the LAMA dataset and ensemble them to get a score for each relation. Other methods argue that natural language prompts might not be the best, and instead optimize a discrete (Shin et al, 2020;Haviv et al, 2021) or continuous (Qin & Eisner, 2021;Zhong et al, 2021;Liu et al, 2021b) prompt for each relation.…”
Section: Related Workmentioning
confidence: 99%
“…We focus on the issue of consistency. Elazar et al (2021) thoroughly investigated this issue, finding models often predict different entities for semantically equivalent prompts. Further, Elazar et al (2021) propose to continue training LLMs with a consistency loss function to improve their robustness.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations