CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

Ye, Qinyuan; Lin, Bill Yuchen; Ren, Xiang

doi:10.18653/v1/2021.emnlp-main.572

Cited by 52 publications

(66 citation statements)

References 118 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our paper, we efficiently achieve few-shot task adaptation by inferring the task-skill allocation matrix for new tasks and fine-tuning skill parameters, which were previously learned via multitask learning. In fact, Ye et al (2021) found that this pre-training routine is superior to meta-learning in CrossFit. A similar attempt to recompose modular knowledge learnt on previous tasks has been recently explored by Ostapenko et al (2021).…”

Section: Related Workmentioning

confidence: 98%

“…In addition, in the wake of the recent surge of interest in massively multitask few-shot NLP models (Min et al, 2021;Wei et al, 2021;Aribandi et al, 2021;Sanh et al, 2022;Karimi Mahabadi et al, 2021, inter alia), we also evaluate our latent-skill model on CrossFit (Ye et al, 2021). This benchmark recasts 160 NLP tasks (including QA, conditional text generation, classification, and other types such as regression) as textto-text generation problems.…”

Section: Fine-grained Skill Selectionmentioning

confidence: 99%

“…In order to measure the benefits of a modular design for systematic generalisation to new tasks, we run a second set of experiments on CrossFit (Ye et al, 2021), a benchmark including 160 diverse natural language processing tasks sourced from Huggingface Datasets (Lhoest et al, 2021). The tasks in CrossFit are all converted into a unified 99.9 ± 0.2 99.9 ± 0.0 99.9 ± 0.0 99.9 ± 0.0 99.9 ± 0.0 99.9 ± 0.2 BSN-NPI-SCOPE Acc 85.7 ± 13.0 99.8 ± 0.3 99.6 ± 0.9 99.9 ± 0.2 99.9 ± 0.0 99.9 ± 0.2 BREAK-QDMR EM 4.8 ± 0.4 4.1 ± 0.5 1.9 ± 1.7 4.1 ± 0.9 3.8 ± 1.2 4.9 ± 0. text-to-text format inspired by Raffel et al (2020).…”

Section: Datasetmentioning

confidence: 99%

“…Hyper-parameter are tuned on the held-out set T dev . We adopt the partition 1 (called RANDOM) from Ye et al (2021), where |T train | = 120, |T dev | = |T eval | = 20, and tasks are split randomly. This is the most comprehensive partition and most suited for general-purpose models, as it includes all types of tasks.…”

Section: Datasetmentioning

confidence: 99%

“…Note that inYe et al (2021) the pre-trained model is BART Small and all parameters are fine-tuned. Hence, these results are not directly comparable.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Combining Modular Skills in Multitask Learning

Ponti¹,

Sordoni²,

Bengio³

et al. 2022

Preprint

View full text Add to dashboard Cite

A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume that each task is associated with a subset of latent discrete skills from a (potentially small) inventory. In turn, skills correspond to parameter-efficient (sparse / lowrank) model parameterisations. By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills. To favour non-trivial soft partitions of skills across tasks, we experiment with a series of inductive biases, such as an Indian Buffet Process prior and a twospeed learning rate. We evaluate our latentskill model on two main settings: 1) multitask reinforcement learning for grounded instruction following on 8 levels of the BabyAI platform; and 2) few-shot adaptation of pre-trained text-to-text generative models on CrossFit, a benchmark comprising 160 NLP tasks. We find that the modular design of a network significantly increases sample-efficiency in reinforcement learning and few-shot generalisation in supervised learning, compared to baselines with fully shared, task-specific, or conditionally generated parameters where knowledge is entangled across tasks. In addition, we show how discrete skills help interpretability, as they yield an explicit hierarchy of tasks.

show abstract

Section: Related Workmentioning

confidence: 98%

Section: Fine-grained Skill Selectionmentioning

confidence: 99%

Section: Datasetmentioning

confidence: 99%

Section: Datasetmentioning

confidence: 99%

“…Note that inYe et al (2021) the pre-trained model is BART Small and all parameters are fine-tuned. Hence, these results are not directly comparable.…”

mentioning

confidence: 99%

See 3 more Smart Citations