2022
DOI: 10.48550/arxiv.2205.05131
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

UL2: Unifying Language Learning Paradigms

Abstract: Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents a unified framework for pre-training models that are universally effective across datasets and setups. We begin by disentangling architectural archetypes with pre-training objectives -two concepts that are commonly conflated. Next, we present a generalized and unified perspective for self-super… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(35 citation statements)
references
References 48 publications
0
34
1
Order By: Relevance
“…UL2 UL2 (Unifying Language Learning, Tay et al 2022a) is an encoder-decoder model trained on a mixture of denoising tasks in a unified framework. In this paper, since we mainly focus on the in-context learning ability of language models, we use UL2-20B in the S-Denoiser mode (i.e., pre-trained with the prefix language modeling) 1 .…”
Section: Pre-trained Language Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…UL2 UL2 (Unifying Language Learning, Tay et al 2022a) is an encoder-decoder model trained on a mixture of denoising tasks in a unified framework. In this paper, since we mainly focus on the in-context learning ability of language models, we use UL2-20B in the S-Denoiser mode (i.e., pre-trained with the prefix language modeling) 1 .…”
Section: Pre-trained Language Modelsmentioning
confidence: 99%
“…Model weights The model weights of two LLMs used in our experiments, i.e., UL2-20B (Tay et al, 2022a) and OPT-30B , are publicly released through GCP bucket (gs: //scenic-bucket/ul2) and Github (https://github.com/facebookresearch/ metaseq), respectively.…”
Section: Reproducibility Statementmentioning
confidence: 99%
See 1 more Smart Citation
“…A promise of FMs is their adaptability, enabling the use a single FM (order 100M-100B parameters) for multiple tasks versus a unique model per personal task in FL. Our work shows there is work to be done to achieve this promise, though we see the development of more versatile FMs over time [77,69,37]. For small FL models (100K parameters), the multiplicative factor is insignificant, yet model quality improves using more parameters, or training for longer over more data [32].…”
Section: Systems Feasibilitymentioning
confidence: 99%
“…The third is the extension to other Transformer models. We focused on Transformer encoderdecoder models, because the recent large encoderdecoder model UL2 (Tay et al, 2022) with 20B parameters has shown better zero-shot performance than GPT3 175B, suggesting the effectiveness of encoder-decoder models. In the future, we will extend our approach to large Transformer encoderdecoders such as T5 and UL2.…”
Section: Limitationsmentioning
confidence: 99%