2022
DOI: 10.48550/arxiv.2201.11903
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Chain of Thought Prompting Elicits Reasoning in Large Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
522
1
3

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 362 publications
(532 citation statements)
references
References 0 publications
6
522
1
3
Order By: Relevance
“…We evaluate self-consistency on a range of arithmetic reasoning and commonsense reasoning tasks, and find that it improves the reasoning ability of language models by a striking margin. Compared to generating a single chain of thought via greedy decoding (Wei et al, 2022), self-consistency contributes additional absolute improvements of +10.6% on the recent grade-school-math dataset (GSM8K; Cobbe et al, 2021), +14.4% on a recently-compiled challenge dataset over math word problems (SVAMP; Patel et al, 2021), and +23.9% on MultiArith (Roy & Roth, 2015). For commonsense reasoning, we also observe significant gains in CommonsenseQA (Talmor et al, 2019) (+5%), and the AI2 Reasoning Challenge (ARC) dataset (Clark et al, 2018), with +4% and +4.7% absolute accuracy improvement in the easy and challenge sets, respectively.…”
Section: Majority Votementioning
confidence: 99%
See 3 more Smart Citations
“…We evaluate self-consistency on a range of arithmetic reasoning and commonsense reasoning tasks, and find that it improves the reasoning ability of language models by a striking margin. Compared to generating a single chain of thought via greedy decoding (Wei et al, 2022), self-consistency contributes additional absolute improvements of +10.6% on the recent grade-school-math dataset (GSM8K; Cobbe et al, 2021), +14.4% on a recently-compiled challenge dataset over math word problems (SVAMP; Patel et al, 2021), and +23.9% on MultiArith (Roy & Roth, 2015). For commonsense reasoning, we also observe significant gains in CommonsenseQA (Talmor et al, 2019) (+5%), and the AI2 Reasoning Challenge (ARC) dataset (Clark et al, 2018), with +4% and +4.7% absolute accuracy improvement in the easy and challenge sets, respectively.…”
Section: Majority Votementioning
confidence: 99%
“…We leverage this intuition by proposing the following self-consistency method. First, a language model is prompted with a set of manually written chain of thought exemplars (Wei et al, 2022). Next, we sample a set of candidate outputs from the language model's decoder (Ackley et al, 1985;Ficler & Goldberg, 2017;Fan et al, 2018;Holtzman et al, 2018;Radford et al, 2019;Holtzman et al, 2020), which produces diversity in the set of generated reasoning paths.…”
Section: Self-consistency Over Diverse Reasoning Pathsmentioning
confidence: 99%
See 2 more Smart Citations
“…The design of prompts can have huge impact on PROMPTING, as pointed by many previous works (Mishra et al, 2021a;Wei et al, 2022). In this section, we investigate how prompt design instruct text generation and affects ZEROGEN's performance.…”
Section: Prompt Engineering In Zerogenmentioning
confidence: 96%