ETC: Encoding Long and Structured Inputs in Transformers

Ainslie, Joshua; Ontañón, Santiago; Alberti, Chris; Cvicek, Vaclav; Fisher, Zachary; Pham, Philip; Ravula, Anirudh; Sanghai, Sumit; Wang, Qifan; Yang, Linzhang

doi:10.18653/v1/2020.emnlp-main.19

Cited by 191 publications

(162 citation statements)

References 43 publications

Supporting

Mentioning

159

Contrasting

Order By: Relevance

“…, e i |s i | } where e i j = e(w i j ) + p token j . e(w i j ) and p token j are the token ... (Devlin et al, 2019), HiBERT and ETC (Ainslie et al, 2020). and positional embeddings of token w i j , respectively.…”

Section: Stepwise Hibertmentioning

confidence: 99%

“…However, the main disadvantage of this approach is that token-level attention across sentences is prohibited and long range attention only happens indirectly at the second-stage encoder (see the middle diagram in Figure 1). Recently, Extended Transformer Construction (ETC; Ainslie et al, 2020) provides an alternative. It alleviates the quadratic memory growth by introducing sparsity to the attention mechanism via its novel global-local attention mechanism (see the rightmost diagram in Figure 1).…”

Section: Stepwise Etcsummentioning

confidence: 99%

“…Structured transformers are transformer-based architectures that have the flexibility to model some form of structure of the input, e.g., hierarchical document structure. In this paper, we specifically study two such architectures -HiBERT and Extended Transformers Construction (ETC; Ainslie et al, 2020). Details of these are given in Sections 4 and 5.…”

Section: Introductionmentioning

confidence: 99%

“…The contributions of the paper are as follows: 1) this is first study to use ETC (Ainslie et al, 2020) for summarization for its ability and flexibility to better model long and structured inputs; 2) we pro-pose augmentions of two structured transformers, HiBERT and ETC, in order to enable stepwise models for extractive planning; 3) we demonstrate empirically that our models are general purpose and can be adapted as an extractive document summarizer or as a content planner for table-to-text generation; 4) Our experiments highlight the effectiveness of stepwise modeling, specifically stepwise ETC, which sets a new standard for both tasks.…”

Section: Introductionmentioning

confidence: 99%

“…

We propose encoder-centric stepwise models for extractive summarization using structured transformers - HiBERT (Zhang et al, 2019) and Extended Transformers (Ainslie et al, 2020). We enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure.

…”

mentioning

confidence: 99%

See 4 more Smart Citations

Stepwise Extractive Summarization and Planning with Structured Transformers

Narayan¹,

Maynez²,

Adámek³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We propose encoder-centric stepwise models for extractive summarization using structured transformers - HiBERT (Zhang et al., 2019) and Extended Transformers (Ainslie et al., 2020). We enable stepwise summarization by injecting the previously generated summary into the structured transformer as an auxiliary sub-structure. Our models are not only efficient in modeling the structure of long inputs, but they also do not rely on task-specific redundancy-aware modeling, making them a general purpose extractive content planner for different tasks. When evaluated on CNN/DailyMail extractive summarization, stepwise models achieve state-of-the-art performance in terms of Rouge without any redundancy aware modeling or sentence filtering. This also holds true for Rotowire tableto-text generation, where our models surpass previously reported metrics for content selection, planning and ordering, highlighting the strength of stepwise modeling. Amongst the two structured transformers we test, stepwise Extended Transformers provides the best performance across both datasets and sets a new standard for these challenges. 1

show abstract

Section: Stepwise Hibertmentioning

confidence: 99%

Section: Stepwise Etcsummentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…

…”

mentioning