2022
DOI: 10.48550/arxiv.2205.11916
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Large Language Models are Zero-Shot Reasoners

Abstract: Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-bystep answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these succes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
233
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 163 publications
(235 citation statements)
references
References 17 publications
2
233
0
Order By: Relevance
“…In summary, our results support earlier findings that large language models are zero-shot Kojima et al (2022) and few-shot learners Brown et al (2020), meaning that they perform well in tasks even when not given any, or given just a few, task-related examples as input. Our work suggests that modern machine learning models such as OpenAI Codex provide many opportunities for programming course designers, although potential challenges outlined in prior work Finnie-Ansley et al (2022) should not be ignored.…”
Section: Discussionsupporting
confidence: 91%
See 1 more Smart Citation
“…In summary, our results support earlier findings that large language models are zero-shot Kojima et al (2022) and few-shot learners Brown et al (2020), meaning that they perform well in tasks even when not given any, or given just a few, task-related examples as input. Our work suggests that modern machine learning models such as OpenAI Codex provide many opportunities for programming course designers, although potential challenges outlined in prior work Finnie-Ansley et al (2022) should not be ignored.…”
Section: Discussionsupporting
confidence: 91%
“…When experimenting with many other kinds of priming statements for generating the explanations, we found that Codex very rarely provided high-level descriptions. This supports the findings of Kojima et al who found that large language models seem to perform better in reasoning tasks when priming them to "think step by step" Kojima et al (2022). Even a very explicit prompt, such as "A high-level description of the above program:", would still usually result in a line-by-line explanation being produced.…”
Section: Code Explanationssupporting
confidence: 87%
“…We are not the first to probe large-scale machine learning models' abilities. Indeed, recently there has been a push towards creating large benchmarks to assess the capability of foundation models [48][49][50] . Large language models have also been studied using other methods from cognitive psychology, such as property induction 51 , thinking-out-loud protocols 52 , or learning causal over-hypotheses 53 , where researchers have come to similar conclusions.…”
Section: Discussionmentioning
confidence: 99%
“…This sparked interest in evaluating the large language models on various reasoning tasks including common-sense reasoning [26,22,7], logical reasoning [24], and even ethical reasoning [12]. The macro-tenor of the drumbeat of these works has been suggesting that LLM's are indeed capable of doing many kinds of reasoning [14,30,4] In this paper, we take a look at the ability of large language models to do reasoning about actions and change which involve common-sense planning tasks. We develop a suit of benchmarks 1 , based on the kinds of domains employed in the International Planning Competition, to test these capabilities.…”
Section: Introductionmentioning
confidence: 99%