Proceedings of the Workshop on New Frontiers in Summarization 2017
DOI: 10.18653/v1/w17-4505
|View full text |Cite
|
Sign up to set email alerts
|

Coarse-to-Fine Attention Models for Document Summarization

Abstract: Sequence-to-sequence models with attention have been successful for a variety of NLP problems, but their speed does not scale well for tasks with long source sequences such as document summarization. We propose a novel coarse-to-fine attention model that hierarchically reads a document, using coarse attention to select top-level chunks of text and fine attention to read the words of the chosen chunks. While the computation for training standard attention models scales linearly with source sequence length, our … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(37 citation statements)
references
References 11 publications
0
37
0
Order By: Relevance
“…HRED: the HieRarchical Encoder-Decoder (HRED) with hierarchical attention mechanism. This architecture has been proven effective in several NLP tasks including summarization (Ling and Rush 2017), headline generation (Tan, Wan, and Xiao 2017), and text generation (Li, Luong, and Jurafsky 2015). Here we keep the LSTMs size as 500 for fairness and set the number of the word encoder and sentence encoder layers as 1 and the decoder layer as 2.…”
Section: Baselines and Ablationsmentioning
confidence: 99%
See 1 more Smart Citation
“…HRED: the HieRarchical Encoder-Decoder (HRED) with hierarchical attention mechanism. This architecture has been proven effective in several NLP tasks including summarization (Ling and Rush 2017), headline generation (Tan, Wan, and Xiao 2017), and text generation (Li, Luong, and Jurafsky 2015). Here we keep the LSTMs size as 500 for fairness and set the number of the word encoder and sentence encoder layers as 1 and the decoder layer as 2.…”
Section: Baselines and Ablationsmentioning
confidence: 99%
“…We design our framework to explicitly tackle the above mentioned challenges by using a data-driven approach to learn to meet these requirements automatically. More specifically, we employ the hierarchical encoder-decoder network, which has already shown potentials to tackle long sequential input (Tan, Wan, and Xiao 2017;Ling and Rush 2017), as the base model for building our framework. On top of the hierarchical encoding structure, we propose the dynamic attention mechanism to combine sentence-level and word-level attentions varying at each recurrent time step to generate a more readable sequence.…”
Section: Introductionmentioning
confidence: 99%
“…Reinforce-select (RS) (Ling and Rush, 2017;Chen and Bansal, 2018) utilizes reinforcement learning to approximate the marginal likelihood. Specifically, it is trained to maximize a lower bound of the likelihood by applying the Jensen inequalily:…”
Section: Reinforce-selectmentioning
confidence: 99%
“…A short summary/headline (sequence of words) [9], [10], [14], [52] [12], [53]- [58] Question Generation A single word answer from a document or the start and end index of the answer in the document…”
Section: Machine Translationmentioning
confidence: 99%