Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.468
|View full text |Cite
|
Sign up to set email alerts
|

Muppet: Massive Multi-task Representations with Pre-Finetuning

Abstract: We propose pre-finetuning, an additional largescale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that prefinetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence pred… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
126
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 138 publications
(129 citation statements)
references
References 47 publications
2
126
1
Order By: Relevance
“…QQP) is of interest. This finding, along with prior work demonstrating the effectiveness of multitask training (Wang et al, 2019;Liu et al, 2019;Aghajanyan et al, 2021), motivates our next question: if we only care about a single task, should we still use a multitask objective?…”
Section: Multitask Training As An Auxiliary Pruning Objectivementioning
confidence: 88%
See 1 more Smart Citation
“…QQP) is of interest. This finding, along with prior work demonstrating the effectiveness of multitask training (Wang et al, 2019;Liu et al, 2019;Aghajanyan et al, 2021), motivates our next question: if we only care about a single task, should we still use a multitask objective?…”
Section: Multitask Training As An Auxiliary Pruning Objectivementioning
confidence: 88%
“…In addition to the benefits described above, we aim for the best of both worlds: a substantially pruned model that also performs well on multiple tasks. We also ask whether individual task performances can be improved by leveraging data from the other tasks, which is a strategy employed by general-purpose language modeling (Liu et al, 2019;Aghajanyan et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…-We propose EXMIX ( §2): a collection of 107 supervised NLP tasks for Extreme Multi-task Scaling, formatted for encoder-decoder training. EXMIX has approximately twice as many tasks as the largest prior study to date (Aghajanyan et al, 2021), totaling 18M labeled examples across diverse task families.…”
Section: Ext5mentioning
confidence: 99%
“…For the first time, we explore and propose Extreme Multi-task Scaling -a new paradigm for multitask pre-training. Compared to the largest prior work (Aghajanyan et al, 2021), our study doubles the number of tasks and focuses on multi-task pre-training rather than fine-tuning, which enables a direct comparison to standard pre-training. Our proposal is based on the insight that despite negative transfer being common during fine-tuning, a massive and diverse collection of pre-training tasks is generally preferable to an expensive search for the best combination of pre-training tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Large models trained on unsupervised objectives on massive text corpora can be finetuned on new tasks and typically perform better than models trained from scratch (Devlin et al, 2018;Radford et al, 2018;Conneau et al, 2019). Moreover, learning from multiple tasks has been shown to be helpful for generalization at all stages of training: during pretraining (Aribandi et al, 2021), as a special training stage before fine-tuning (Aghajanyan et al, 2021), and even during fine-tuning (Tang et al, 2020). In all cases, one attains stronger positive transfer by training on larger number of tasks, intensifying the need for a flexible way to handle an arbitrary number of tasks.…”
Section: Introductionmentioning
confidence: 99%