Proceedings of the Workshop on New Frontiers in Summarization 2017
DOI: 10.18653/v1/w17-4511
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

Abstract: The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possibilities to scale up to larger input document collections by selecting a small number of sentences from each document prior to constructing the summary. Experiments were done on the DUC20… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 7 publications
0
7
0
Order By: Relevance
“…If the cosine similarity is larger than the pre-defined threshold δ (see Section 5.4), the corresponding cluster is considered as a candidate for the date. Finally, we apply CENTROID-OPT (Ghalandari 2017) as a sentence ranking algorithm within a cluster and summarize each date individually by selecting one sentence per cluster with the highest ranking score.…”
Section: Timeline Summary Extractormentioning
confidence: 99%
“…If the cosine similarity is larger than the pre-defined threshold δ (see Section 5.4), the corresponding cluster is considered as a candidate for the date. Finally, we apply CENTROID-OPT (Ghalandari 2017) as a sentence ranking algorithm within a cluster and summarize each date individually by selecting one sentence per cluster with the highest ranking score.…”
Section: Timeline Summary Extractormentioning
confidence: 99%
“…Unsupervised Opinion Summarization Extractive summarization consists in selecting a few sentences from the input documents to form the output summary. The centroid method Rossiello et al, 2017;Gholipour Ghalandari, 2017) consists in ranking sentences according to their relevance to the whole input. Graph-based methods, such as LexRank (Erkan and Radev, 2004) or TextRank (Mihalcea and Tarau, 2004;Zheng and Lapata, 2019), use the PageRank algorithm to find the most central sentences in a graph of input sentences, where edge weights indicate word overlap.…”
Section: Related Workmentioning
confidence: 99%
“…Unsupervised extractive summarization methods consists in selecting the most salient sentences from a text. Saliency can be quantified with the centroid method Gholipour Ghalandari, 2017;Rossiello et al, 2017), which consists in computing vector representations for sentences and selecting which sentences are the closest to their centroid, and thus the most representative of the set. Other proposals make use of the PageRank algorithm (Mihalcea and Tarau, 2004;Erkan and Radev, 2004) to compute sentence saliency.…”
Section: Related Workmentioning
confidence: 99%