Few-Shot Text Generation with Natural Language Instructions

Schick, Timo; Schütze, Hinrich

doi:10.18653/v1/2021.emnlp-main.32

Cited by 62 publications

(28 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Prompt engineering. Constructing effective discrete prompts for language models to perform NLP tasks is an active area of research (Schick and Schütze, 2021;Reynolds and McDonell, 2021;Liu et al, 2021). Such prompts are often extremely short and may not include a complete definition of complex tasks.…”

Section: Related Workmentioning

confidence: 99%

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Mishra¹,

Khashabi²,

Baral³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

104

View full text Add to dashboard Cite

Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on individual datasets, such models often struggle with generalization across tasks (e.g., a question-answering system cannot solve classification tasks). A long-standing challenge in AI is to build a model that learns a new task by understanding the humanreadable instructions that define it. To study this, we introduce NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their humanauthored instructions, and 193k task instances (input-output pairs). The instructions are obtained from crowdsourcing instructions used to create existing NLP datasets and mapped to a unified schema. Using this meta-dataset, we measure cross-task generalization by training models on seen tasks and measuring generalization to the remaining unseen ones. We adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Our results indicate that models benefit from instructions when evaluated in terms of generalization to unseen tasks (19% better for models utilizing instructions). These models, however, are far behind an estimated performance upperbound, indicating significant room for more progress in this direction. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Mishra¹,

Khashabi²,

Baral³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

104

View full text Add to dashboard Cite

show abstract

“…Previous work has shown that mixing multiple prompting templates can improve few-shot performance for both classification (Schick and Schütze, 2021a,c;Gao et al, 2021) and generation (Schick and Schütze, 2021b). We argue that such ensembling could produce more regularized path scores by alleviating prompt sensitivity (Zhao et al, 2021).…”

Section: Instruction Ensemblingmentioning

confidence: 66%

Few-shot Reranking for Multi-hop QA via Language Model Prompting

Khalifa¹,

Logeswaran²,

Lee³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study unsupervised multi-hop reranking for multi-hop QA (MQA) with open-domain questions. Since MQA requires piecing information from multiple documents, the main challenge thus resides in retrieving and reranking chains of passages that support the reasoning process. Our approach relies on LargE models with Prompt-Utilizing reranking Strategy (LEPUS): we construct an instructionlike prompt based on a candidate document path and compute a relevance score of the path as the probability of generating a given question, according to a pre-trained language model. Though unsupervised, LEPUS yields competitive reranking performance against state-of-the-art methods that are trained on thousands of examples. Adding a small number of samples (e.g., 2), we demonstrate further performance gain using in-context learning. Finally, we show that when integrated with a reader module, LEPUS can obtain competitive multi-hop QA performance, e.g., outperforming fully-supervised QA systems. 1

show abstract

“…Few-shot learning can also be accomplished by combining textual templates ("prompts") and various forms of model finetuning, either fully updating a model's parameters, e.g. for classification (Schick & Schütze, 2021a;Schick & Schutze, 2021;Gao et al, 2021;Tam et al, 2021) or generation (Schick & Schütze, 2021b). Prompts themselves can be optimized, for example by search (Jiang et al, 2020;Shin et al, 2020) or by only updating parts of the model (Logan et al, 2021), or learning "soft-prompts" (Lester et al, 2021;Li & Liang, 2021).…”

Section: Few-shot Learningmentioning

confidence: 99%

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Izacard¹,

Lewis²,

Lomelí³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples. We perform evaluations on a wide range of tasks, including MMLU, KILT and NaturalQuestions, and study the impact of the content of the document index, showing that it can easily be updated. Notably, Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.

show abstract

Few-Shot Text Generation with Natural Language Instructions

Cited by 62 publications

References 36 publications

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Few-shot Reranking for Multi-hop QA via Language Model Prompting

Atlas: Few-shot Learning with Retrieval Augmented Language Models

Contact Info

Product

Resources

About