2020
DOI: 10.48550/arxiv.2005.14165
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Language Models are Few-Shot Learners

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions -something which current NLP systems still largely struggle to do. Here we sho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
1,251
2
5

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 1,295 publications
(1,586 citation statements)
references
References 29 publications
6
1,251
2
5
Order By: Relevance
“…With the application of large-scale pre-trained language models such as BERT [Devlin et al, 2018] and GPT [Brown et al, 2020], many downstream applications have achieved significant improvements by finetuning on top of the pre-trained models. However, due to the large scale of model parameters, finetuning brings a large computational and storage burden.…”
Section: Prompt Tuning Methods In Nlpmentioning
confidence: 99%
See 1 more Smart Citation
“…With the application of large-scale pre-trained language models such as BERT [Devlin et al, 2018] and GPT [Brown et al, 2020], many downstream applications have achieved significant improvements by finetuning on top of the pre-trained models. However, due to the large scale of model parameters, finetuning brings a large computational and storage burden.…”
Section: Prompt Tuning Methods In Nlpmentioning
confidence: 99%
“…Currently, prompt tuning methods can be roughly divided into two categories: manually crafted and automatically learned. While manually crafting prompts [Brown et al, 2020] [Radford et al, 2021is intuitive, creating and experimenting with these prompts takes time and experience, even experienced prompt designers may fail to manually discover optimal prompts . To automate prompt engineering, ] [Lester et al, 2021] [Zhou et al, 2021 paramerized the prompts by treating prompts as virtual tokens and perform prompting directly in the embedding space.…”
Section: Prompt Tuning Methods In Nlpmentioning
confidence: 99%
“…4(b). The GELU func- tion is used extensively in Transformer networks for natural language processing, which are regularly amongst the largest deep learning models [29]. Thus, our alloptical PPLN nanophotonic waveguide implementation gains greater real-world applicability by being compatible with a wide range of existing deep learning models, especially the largest models where energy efficiency is paramount.…”
Section: Femtojoule Relu Functionmentioning
confidence: 99%
“…Text knowledge for vision tasks. Recent studies [35], [36] explore using large-scale language models [37] for vision tasks and observe significant improvements with text knowledge. We expect CCD provides a better way of utilizing the text knowledge, thus further improving the vision task performance.…”
Section: Related Workmentioning
confidence: 99%