2021
DOI: 10.48550/arxiv.2111.10952
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

Abstract: Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training. Towards this goal, this paper introduces EXMIX (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families. Using EXMIX, we study the effect of multi-task pre-training at the largest scale to date, and analyze cotraining transfer amongst common famili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(27 citation statements)
references
References 36 publications
0
27
0
Order By: Relevance
“…Finally, this work is also closely related to the theme of model unification where unified multi-task models have been recently popular due to immense potential (Raffel et al, 2019;Khashabi et al, 2020;Aribandi et al, 2021). Hence, the proposed DSI presents an opportunity to integrate discrete and disjoint search operations into end-to-end unified models -a unique capability that was not possible before.…”
Section: Related Workmentioning
confidence: 96%
“…Finally, this work is also closely related to the theme of model unification where unified multi-task models have been recently popular due to immense potential (Raffel et al, 2019;Khashabi et al, 2020;Aribandi et al, 2021). Hence, the proposed DSI presents an opportunity to integrate discrete and disjoint search operations into end-to-end unified models -a unique capability that was not possible before.…”
Section: Related Workmentioning
confidence: 96%
“…How to aggregate performances? The multi-tasks setting has been investigated in recent works that provide benchmark of state-of-the-art models across a great variety of tasks (Rajpurkar et al, 2016;McCann et al, 2018;Conneau et al, 2018a;Zheng et al, 2021;Tay et al, 2020b), sometimes with more than fifty (Siddhant et al, 2020;Aribandi et al, 2021;Wei et al, 2021;Sanh et al, 2021). These papers provide tables of scores across the considered tasks, but the only non-qualitative way to compare systems consists in averaging the performances across tasks and then ranking systems according to their mean score values.…”
Section: Work In Progressmentioning
confidence: 99%
“…In addition, in the wake of the recent surge of interest in massively multitask few-shot NLP models (Min et al, 2021;Wei et al, 2021;Aribandi et al, 2021;Sanh et al, 2022;Karimi Mahabadi et al, 2021, inter alia), we also evaluate our latent-skill model on CrossFit (Ye et al, 2021). This benchmark recasts 160 NLP tasks (including QA, conditional text generation, classification, and other types such as regression) as textto-text generation problems.…”
Section: Fine-grained Skill Selectionmentioning
confidence: 99%
“…Multitask NLP Multitask learning for NLP has been an effective strategy for improving model performance in low-resource tasks and for quickly adapting to new, unseen tasks (Ruder et al, 2019;Liu et al, 2019;Min et al, 2021;Wei et al, 2021;Aribandi et al, 2021;Sanh et al, 2022;Karimi Mahabadi et al, 2021;Rusu et al, 2019), languages (Ponti et al, 2019), and modalities (Bugliarello et al, 2022). Liu et al (2019) adopt a multitask training strategy with a shared model and achieve impressive performance on GLUE.…”
Section: Related Workmentioning
confidence: 99%