Findings of the Association for Computational Linguistics: NAACL 2022 2022
DOI: 10.18653/v1/2022.findings-naacl.55
|View full text |Cite
|
Sign up to set email alerts
|

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Abstract: Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present LongT5, a new model that explores the effects of scaling both the input length and model size at the same time. Specifically, we integrate attention ideas from long-input transformers (ETC), and adopt pretraining strategies from summarization pretraining (PEGASUS) into the scalable T5 architecture. The result is a new attention … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
88
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 84 publications
(88 citation statements)
references
References 17 publications
0
88
0
Order By: Relevance
“…cess long sequences (Rae et al, 2020;Beltagy et al, 2020;Zaheer et al, 2020;Roy et al, 2021). Sparse attention , relative position encoding (Shaw et al, 2018;Raffel et al, 2020;Guo et al, 2021), recurrence mechanism and memory (Dai et al, 2019;Weston et al, 2015;Hutchins et al, 2022;? ) and other tricks (Shen et al, 2020;Katharopoulos et al, 2020;Gupta and Berant, 2020;Stock et al, 2021;Yogatama et al, 2021;Borgeaud et al, 2021;Hawthorne et al, 2022) are commonly adopted by recent Transformer variants to make the operation on long sequences more time/memory efficient.…”
Section: Related Workmentioning
confidence: 99%
“…cess long sequences (Rae et al, 2020;Beltagy et al, 2020;Zaheer et al, 2020;Roy et al, 2021). Sparse attention , relative position encoding (Shaw et al, 2018;Raffel et al, 2020;Guo et al, 2021), recurrence mechanism and memory (Dai et al, 2019;Weston et al, 2015;Hutchins et al, 2022;? ) and other tricks (Shen et al, 2020;Katharopoulos et al, 2020;Gupta and Berant, 2020;Stock et al, 2021;Yogatama et al, 2021;Borgeaud et al, 2021;Hawthorne et al, 2022) are commonly adopted by recent Transformer variants to make the operation on long sequences more time/memory efficient.…”
Section: Related Workmentioning
confidence: 99%
“…Few-shot prompt tuning systems include PaLM 540B, with and without AF refinement, and with ASQA prompt exemplars during dynamic prompt selection. Finally, fully finetuned systems include T5-XL closed book and open book, LongT5-XL (Guo et al, 2022) open book that allows longer contexts, and PaLM 540B prompt tuning. For the open book systems, we filled in the context with as many passages as possible within the limits of their input lengths.…”
Section: Resultsmentioning
confidence: 99%
“…The main challenge of using a vanilla Transformer architecture is the quadratic cost in time and memory with regard to the input sequence length due to the self-attention operation. There has been a surge of recent works addressing this problem [6,38,1,13,40,8]. They are primarily dedicated to improving either the efficiency of the self-attention mechanism or the general efficiency of the Transformer architecture through sparse models.…”
Section: Related Workmentioning
confidence: 99%