Language Models are Few-Shot Learners

Brown, T. B.; Mann, Benjamin F.; Ryder, N. C.; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, A.; Ziegler, Daniel M.; Wu, Jeffrey C.S.; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric J.; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack A.; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario

doi:10.48550/arxiv.2005.14165

Cited by 1,295 publications

(1,586 citation statements)

References 29 publications

Supporting

Mentioning

1,251

Contrasting

Unclassified

Order By: Relevance

“…With the application of large-scale pre-trained language models such as BERT [Devlin et al, 2018] and GPT [Brown et al, 2020], many downstream applications have achieved significant improvements by finetuning on top of the pre-trained models. However, due to the large scale of model parameters, finetuning brings a large computational and storage burden.…”

Section: Prompt Tuning Methods In Nlpmentioning

confidence: 99%

“…Currently, prompt tuning methods can be roughly divided into two categories: manually crafted and automatically learned. While manually crafting prompts [Brown et al, 2020] [Radford et al, 2021is intuitive, creating and experimenting with these prompts takes time and experience, even experienced prompt designers may fail to manually discover optimal prompts . To automate prompt engineering, ] [Lester et al, 2021] [Zhou et al, 2021 paramerized the prompts by treating prompts as virtual tokens and perform prompting directly in the embedding space.…”

Section: Prompt Tuning Methods In Nlpmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Compose Diversified Prompts for Image Emotion Classification

Deng¹,

Wu²,

Shi³

et al. 2022

Preprint

View full text Add to dashboard Cite

Contrastive Language-Image Pre-training (CLIP) represents the latest incarnation of pre-trained vision-language models. Although CLIP has recently shown its superior power on a wide range of downstream vision-language tasks like Visual Question Answering, it is still underexplored for Image Emotion Classification (IEC). Adapting CLIP to the IEC task has three significant challenges, tremendous training objective gap between pretraining and IEC, shared suboptimal and invariant prompts for all instances. In this paper, we propose a general framework that shows how CLIP can be effectively applied to IEC. We first introduce a prompt tuning method that mimics the pretraining objective of CLIP and thus can leverage the rich image and text semantics entailed in CLIP. Then we automatically compose instance-specific prompts by conditioning them on the categories and image contents of instances, diversifying prompts and avoiding suboptimal problems. Evaluations on six widely-used affective datasets demonstrate that our proposed method outperforms the state-of-theart methods to a large margin (i.e., up to 9.29% accuracy gain on EmotionROI dataset) on IEC tasks, with only a few parameters trained. Our codes will be publicly available for research purposes.

show abstract

Section: Prompt Tuning Methods In Nlpmentioning

confidence: 99%

Section: Prompt Tuning Methods In Nlpmentioning

confidence: 99%

Learning to Compose Diversified Prompts for Image Emotion Classification

Deng¹,

Wu²,

Shi³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…4(b). The GELU func- tion is used extensively in Transformer networks for natural language processing, which are regularly amongst the largest deep learning models [29]. Thus, our alloptical PPLN nanophotonic waveguide implementation gains greater real-world applicability by being compatible with a wide range of existing deep learning models, especially the largest models where energy efficiency is paramount.…”

Section: Femtojoule Relu Functionmentioning

confidence: 99%

All-optical ultrafast ReLU function for energy-efficient nanophotonic deep learning

Li¹,

Sekine²,

Nehra³

et al. 2022

Preprint

View full text Add to dashboard Cite

In recent years, the computational demands of deep learning applications have necessitated the introduction of energy-efficient hardware accelerators. Optical neural networks are a promising option; however, thus far they have been largely limited by the lack of energy-efficient nonlinear optical functions. Here, we experimentally demonstrate an all-optical Rectified Linear Unit (ReLU), which is the most widely used nonlinear activation function for deep learning, using a periodically-poled thin-film lithium niobate nanophotonic waveguide and achieve ultra-low energies in the regime of femtojoules per activation with near-instantaneous operation. Our results provide a clear and practical path towards truly all-optical, energy-efficient nanophotonic deep learning.

show abstract

“…Text knowledge for vision tasks. Recent studies [35], [36] explore using large-scale language models [37] for vision tasks and observe significant improvements with text knowledge. We expect CCD provides a better way of utilizing the text knowledge, thus further improving the vision task performance.…”

Section: Related Workmentioning

confidence: 99%

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

Yang¹,

Liu²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation. Unlike previous anticipation tasks that aim at action label prediction, our work targets at generating natural language outputs that provide interpretable and accurate descriptions of future action steps. It is a challenging task due to the lack of semantic information extracted from the instructional videos. To overcome this challenge, we propose a novel knowledge distillation framework to exploit the related external textual knowledge to assist the visual anticipation task. However, previous knowledge distillation techniques generally transfer information within the same modality. To bridge the gap between the visual and text modalities during the distillation process, we devise a novel cross-modal contrastive distillation (CCD) scheme, which facilitates knowledge distillation between teacher and student in heterogeneous modalities with the proposed crossmodal distillation loss. We evaluate our method on the Tasty Videos dataset. CCD improves the anticipation performance of the visual-alone student model by a large margin of 40.2% relatively in BLEU4. Our approach also outperforms the stateof-the-art approaches by a large margin.

show abstract

Language Models are Few-Shot Learners

Cited by 1,295 publications

References 29 publications

Learning to Compose Diversified Prompts for Image Emotion Classification

Learning to Compose Diversified Prompts for Image Emotion Classification

All-optical ultrafast ReLU function for energy-efficient nanophotonic deep learning

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

Contact Info

Product

Resources

About