2022
DOI: 10.48550/arxiv.2204.02311
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PaLM: Scaling Language Modeling with Pathways

Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM).We trained PaLM on 6144 TPU v4 chips using Pat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
522
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 574 publications
(779 citation statements)
references
References 89 publications
4
522
2
Order By: Relevance
“…To achieve this, Flamingo takes inspiration from recent work in large-scale generative language models (LMs) which are good few-shot learners (Brown et al, 2020;Chowdhery et al, 2022;Hoffmann et al, 2022;Rae et al, 2021). A single large LM can indeed achieve strong performance on many tasks using only its text interface: a few examples of a task are provided to the model as a prompt, along with a query input, and the model generates a continuation to produce a predicted output for the task on that query.…”
Section: Introductionmentioning
confidence: 99%
“…To achieve this, Flamingo takes inspiration from recent work in large-scale generative language models (LMs) which are good few-shot learners (Brown et al, 2020;Chowdhery et al, 2022;Hoffmann et al, 2022;Rae et al, 2021). A single large LM can indeed achieve strong performance on many tasks using only its text interface: a few examples of a task are provided to the model as a prompt, along with a query input, and the model generates a continuation to produce a predicted output for the task on that query.…”
Section: Introductionmentioning
confidence: 99%
“…Large language models trained on vast repositories of code have demonstrated remarkable progress in neural program synthesis and related tasks [18,10,68,46,21]. However, such models generate code left-to-right, which makes them less directly applicable to many ubiquitous code editing tasks, such as fixing bugs, adding comments, or re-naming variables.…”
Section: Introductionmentioning
confidence: 99%
“…We compare to results obtained from their API on an infilling task in Table10in the Appendix 17. While this setting is not directly comparable to the three-shot setting where the models of Austin et al[10] and Chowdhery et al[21] performed best, we found that our model did not benefit from additional examples in the prompt, which we attribute to much smaller size of our model (6.7B, versus 137B or 540B parameters) and the sensitivity of in-context learning to model scale.…”
mentioning
confidence: 99%
“…To this end, we use examples from IMPLICITRELATIONS in a few-shot in-context learning setting , where given several input-output examples and a test input, the LM is expected to generate the required output. We focus on this setup following the recent progress in in-context learning, specifically for tasks that involve general commonsense reasoning (Da et al, 2021;Chowdhery et al, 2022).…”
Section: Experimental Settingmentioning
confidence: 99%
“…Recent work Smith et al, 2022;Chowdhery et al, 2022) has shown that the reasoning abilities of LMs improve with model size. We evaluate this effect on four models from the GPT-3 family: ada, babbage, curie, and davinci, which are assumed to have been trained using the same procedure, and are estimated to have 350M, 1.3B, 6.7B, and 175B parameters, respectively (Gao, 2021;Black et al, 2022).…”
Section: Effect Of Model Sizementioning
confidence: 99%