2023
DOI: 10.48550/arxiv.2301.13688
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Abstract: We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 . Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
42
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 33 publications
(42 citation statements)
references
References 33 publications
0
42
0
Order By: Relevance
“…In this approach, (instruction, output) pairs are collected from existing annotated natural language datasets by using templates to transform text-label pairs to (instruction, output) pairs. Datasets such as Flan (Longpre et al, 2023) and P3 (Sanh et al, 2021) are constructed based on the data integration strategy.…”
Section: Instruction Dataset Constructionmentioning
confidence: 99%
See 3 more Smart Citations
“…In this approach, (instruction, output) pairs are collected from existing annotated natural language datasets by using templates to transform text-label pairs to (instruction, output) pairs. Datasets such as Flan (Longpre et al, 2023) and P3 (Sanh et al, 2021) are constructed based on the data integration strategy.…”
Section: Instruction Dataset Constructionmentioning
confidence: 99%
“…Flan 2021 (Longpre et al, 2023) is an English instruction dataset constructed by transforming 62 widely-used NLP benchmarks (e.g., SST-2, SNLI, AG News, MultiRC) into language inputoutput pairs. Each instance in the Flan 2021 has "input" and "target" components.…”
Section: Flan 2021mentioning
confidence: 99%
See 2 more Smart Citations
“…In recent years, NLP communities have witnessed the rise of very large language models such as GPT-3 (175B parameters) [8], PaLM (540B parameters) [49], Bloom (176B parameters) [50], OPT (up to 175B parameters) [51], and the FLAN series (FLAN has 137B parameters) [52]. At their core, these large language models are transformer models inspired by BERT and GPT, albeit at a much larger scale.…”
Section: Very Large Language Modelsmentioning
confidence: 99%