Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.509
|View full text |Cite
|
Sign up to set email alerts
|

Better Highlighting: Creating Sub-Sentence Summary Highlights

Abstract: Amongst the best means to summarize is highlighting. In this paper, we aim to generate summary highlights to be overlaid on the original documents to make it easier for readers to sift through a large amount of text. The method allows summaries to be understood in context to prevent a summarizer from distorting the original meaning, of which abstractive summarizers usually fall short. In particular, we present a new method to produce self-contained highlights that are understandable on their own to avoid confu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 37 publications
1
16
0
Order By: Relevance
“…An example of such potential summary, illustrated by an oracle-system summary derived from our supervised aligner predictions on CNN/DailyMail, is shown in Table 9. Alternatively, our data can contribute to the recent highlighting task (Arumae et al, 2019;Cho et al, 2020), where salient information fragments are marked inside a document, thus circumventing the need to generate coherent text. Further, propositions may be fused together to generate a coherent abstractive summary.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…An example of such potential summary, illustrated by an oracle-system summary derived from our supervised aligner predictions on CNN/DailyMail, is shown in Table 9. Alternatively, our data can contribute to the recent highlighting task (Arumae et al, 2019;Cho et al, 2020), where salient information fragments are marked inside a document, thus circumventing the need to generate coherent text. Further, propositions may be fused together to generate a coherent abstractive summary.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…We compare our method to several strong extractive baselines: SumBasic (Vanderwende et al, 2007) extracts phrases with words that appear frequently in the documents; KLSumm (Haghighi and Vanderwende, 2009) extracts sentences that optimize KL-divergence; LexRank (Erkan and Radev, 2004) is a graph-based approach where vertices represent sentences, the edges stand for word overlap between sentences, and sentence importance is computed by eigenvector centrality; DPP-Caps-Comb (Cho et al, 2019) balances between salient sentence extraction and redundancy avoidance by optimizing determinantal point processes (DPP); HL-XLNetSegs and HL-TreeSegs (Cho et al, 2020) are two versions of a DPP-based span highlighting approach that heuristically extracts candidate spans by their probability to begin and end with an EOS token; RL-MMR (Mao et al, 2020) adapts a reinforcement learning single document summarization (SDS) approach (Chen and Bansal, 2018) the multi-document setup and integrates Maximal Margin Relevance (MMR) to avoid redundancy. We additionally compare to some abstractive baselines: Opinosis (Ganesan et al, 2010) generates abstracts from salient paths in a word cooccurrence graph; Extract+Rewrite selects sentences using LexRank and generates for each sentence a title-like summary; PG (See et al, 2017) runs a Pointer-Generator model that includes a sequence-to-sequence network with a copy-mechanism; PG-MMR (Lebanoff et al, 2018) selects representative sentences with MMR and fuses them with a PG-based model; MDS-Joint-SDS (Jin and Wan, 2020) is a hierarchical encoderdecoder architecture that is trained with SDS and MDS datasets while preserving document boundaries.…”
Section: Compared Methodsmentioning
confidence: 99%
“…Accordingly, Arumae et al (2019) established the highlighting task, where salient sub-sentence units are marked within their document to provide context around the salient units. Recently, Cho et al (2020) proposed self-contained sub-sentence units, obtained heuristically by a language model score for adding an EOS token at the begining and the end of the text unit.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Cho, Lebanoff, Foroosh, and Liu (2019a) propose an improved similarity function based on capsule networks for extractive summarization, while Cho, Li, Yu, Foroosh, and Liu (2019b) fine-tune a BERT model to learn representations of sentence similarity and importance and then use these to extract sentences with DPP. More recently, Cho, Song, Li, Yu, Foroosh, and Liu (2020) applied DPP to highlight a subset of important and non-redundant text segments in a multi-document input. These approaches use DPP inference to extract a concrete subset of elements, i.e., sentences or tweets.…”
Section: Related Workmentioning
confidence: 99%