LongT5: Efficient Text-To-Text Transformer for Long Sequences

Guo, Mandy; Ainslie, Joshua; Uthus, David; Ontañón, Santiago; Ni, Jianmo; Sung, Yun-Hsuan; Yang, Yan

doi:10.18653/v1/2022.findings-naacl.55

Cited by 84 publications

(88 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…cess long sequences (Rae et al, 2020;Beltagy et al, 2020;Zaheer et al, 2020;Roy et al, 2021). Sparse attention , relative position encoding (Shaw et al, 2018;Raffel et al, 2020;Guo et al, 2021), recurrence mechanism and memory (Dai et al, 2019;Weston et al, 2015;Hutchins et al, 2022;? ) and other tricks (Shen et al, 2020;Katharopoulos et al, 2020;Gupta and Berant, 2020;Stock et al, 2021;Yogatama et al, 2021;Borgeaud et al, 2021;Hawthorne et al, 2022) are commonly adopted by recent Transformer variants to make the operation on long sequences more time/memory efficient.…”

Section: Related Workmentioning

confidence: 99%

ChapterBreak: A Challenge Dataset for Long-Range Language Models

Sun¹,

Thai²,

Iyyer³

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed. To this end, we introduce CHAPTERBREAK, a challenge dataset that provides an LRLM with a long segment from a narrative that ends at a chapter boundary and asks it to distinguish the beginning of the ground-truth next chapter from a set of negative segments sampled from the same narrative. A fine-grained human annotation reveals that our dataset contains many complex types of chapter transitions (e.g., parallel narratives, cliffhanger endings) that require processing global context to comprehend. Experiments on CHAPTERBREAK show that existing LRLMs fail to effectively leverage long-range context, substantially underperforming a segment-level model trained directly for this task. We publicly release our CHAPTERBREAK dataset to spur more principled future research into LRLMs. 1

show abstract

Section: Related Workmentioning

confidence: 99%

ChapterBreak: A Challenge Dataset for Long-Range Language Models

Sun¹,

Thai²,

Iyyer³

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Few-shot prompt tuning systems include PaLM 540B, with and without AF refinement, and with ASQA prompt exemplars during dynamic prompt selection. Finally, fully finetuned systems include T5-XL closed book and open book, LongT5-XL (Guo et al, 2022) open book that allows longer contexts, and PaLM 540B prompt tuning. For the open book systems, we filled in the context with as many passages as possible within the limits of their input lengths.…”

Section: Resultsmentioning

confidence: 99%

Query Refinement Prompts for Closed-Book Long-Form Question Answering

Amplayo¹,

Webster²,

Collins³

et al. 2022

Preprint

View full text Add to dashboard Cite

Large language models (LLMs) have been shown to perform well in answering questions and in producing long-form texts, both in few-shot closed-book settings. While the former can be validated using well-known evaluation metrics, the latter is difficult to evaluate. We resolve the difficulties to evaluate long-form output by doing both tasks at once -to do question answering that requires long-form answers. Such questions tend to be multifaceted, i.e., they may have ambiguities and/or require information from multiple sources. To this end, we define query refinement prompts that encourage LLMs to explicitly express the multifacetedness in questions and generate long-form answers covering multiple facets of the question. Our experiments on two long-form question answering datasets, ASQA and AQuAMuSe, show that using our prompts allows us to outperform fully finetuned models in the closed book setting, as well as achieve results comparable to retrieve-then-generate open-book models.

show abstract

“…The main challenge of using a vanilla Transformer architecture is the quadratic cost in time and memory with regard to the input sequence length due to the self-attention operation. There has been a surge of recent works addressing this problem [6,38,1,13,40,8]. They are primarily dedicated to improving either the efficiency of the self-attention mechanism or the general efficiency of the Transformer architecture through sparse models.…”

Section: Related Workmentioning

confidence: 99%

Prompt Injection: Parameterization of Fixed Inputs

Choi¹,

Jo²,

Jang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent works have shown that attaching prompts to the input is effective at conditioning Language Models (LM) to perform specific tasks. However, prompts are always included in the input text during inference, thus incurring substantial computational and memory overhead. Also, there is currently no straightforward method of utilizing prompts that are longer than the maximum input length of the LMs without incurring additional costs during inference. We propose Prompt Injection (PI), a novel formulation of injecting the prompt into the parameters of an LM to be an efficient alternative to attaching fixed prompts to the input. We show that in scenarios with long fixed prompts, PI can be up to 280 times more efficient in terms of total FLOPs than previous approaches. We further explore methodologies for PI and show promising results in persona-dependent conversation, semantic parsing, and zero-shot learning with task instructions. Through these explorations, we show that PI can be a promising direction for conditioning language models, especially in scenarios with long and fixed prompts 1 .1 Code used for the experiments is available at this link Preprint. Under review.

show abstract

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Cited by 84 publications

References 17 publications

ChapterBreak: A Challenge Dataset for Long-Range Language Models

ChapterBreak: A Challenge Dataset for Long-Range Language Models

Query Refinement Prompts for Closed-Book Long-Form Question Answering

Prompt Injection: Parameterization of Fixed Inputs

Contact Info

Product

Resources

About