Proceedings of the MultiLing 2017 Workshop on Summarization And Summary Evaluation Across Source Types and Genres 2017
DOI: 10.18653/v1/w17-1003
|View full text |Cite
|
Sign up to set email alerts
|

Centroid-based Text Summarization through Compositionality of Word Embeddings

Abstract: The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroidbased method for text summarization that exploits the compositional capabilities of word embeddings. The evaluations on multi-document and multilingual datasets prove the effectiveness of the continuo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
100
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 114 publications
(103 citation statements)
references
References 25 publications
1
100
0
2
Order By: Relevance
“…We implement the selection of top-ranking features for both the original and modified models slightly differently to Rossiello et al (2017): all words in the vocabulary are ranked by their value in the centroid vector. On a development dataset, a parameter is tuned that defines the proportion of the ranked vocabulary that is represented in the centroid vector and the rest is set to zero.…”
Section: Original Centroid-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We implement the selection of top-ranking features for both the original and modified models slightly differently to Rossiello et al (2017): all words in the vocabulary are ranked by their value in the centroid vector. On a development dataset, a parameter is tuned that defines the proportion of the ranked vocabulary that is represented in the centroid vector and the rest is set to zero.…”
Section: Original Centroid-based Methodsmentioning
confidence: 99%
“…Rossiello et al (2017) improved the centroidbased method by representing sentences as sums of word embeddings instead of TF-IDF vectors so that semantic relationships between sentences that have no words in common can be captured. Mackie et al (2016) also evaluated summaries from SumRepo and did experiments on improving baseline systems such as the centroid-based and the KL-divergence method with different antiredundancy filters.…”
Section: Related Workmentioning
confidence: 99%
“…For the comparisons, two unsupervised baseline models are employed. A graph-based unsupervised sentence extraction method, TextRank is employed (Mihalcea and Tarau, 2004), where sentence embeddings are used instead of bag-ofwords representations, based on (Rossiello et al, 2017). As an unsupervised word-level extractive approach, we employ Opinosis (Ganesan et al, 2010), which detects salient phrases in terms of their redundancy.…”
Section: Baselinementioning
confidence: 99%
“…We use the term concept to denote an arithmetic structure obtained by averaging a set of word vectors to encode a common concept. For example, a concept can represent the selectional preferences of a verb [BDK14], or phrases and sentences [RBS17]. The goal is to evaluate the similarity between a concept and a word (or another concept), e.g., to find nouns a verb prefers as objects.…”
Section: • Compare Conceptsmentioning
confidence: 99%