Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1116
|View full text |Cite
|
Sign up to set email alerts
|

The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization

Abstract: ROUGE is widely used to automatically evaluate summarization systems. However, ROUGE measures semantic overlap between a system summary and a human reference on word-string level, much at odds with the contemporary treatment of semantic meaning. Here we present a suite of experiments on using distributed representations for evaluating summarizers, both in reference-based and in reference-free setting. Our experimental results show that the max value over each dimension of the summary ELMo word embeddings is a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(22 citation statements)
references
References 24 publications
0
22
0
Order By: Relevance
“…Different from the textual information in other NLP tasks, such as document summarization [26,37,38], the textual information in visual dialogue has obviously structured characteristics between each Q-A pair [49]. In the meantime, distinct from other visionlanguage tasks, like VQA, the relationship between each visual entity is widely asked [15].…”
Section: Knowledge Encodingmentioning
confidence: 99%
“…Different from the textual information in other NLP tasks, such as document summarization [26,37,38], the textual information in visual dialogue has obviously structured characteristics between each Q-A pair [49]. In the meantime, distinct from other visionlanguage tasks, like VQA, the relationship between each visual entity is widely asked [15].…”
Section: Knowledge Encodingmentioning
confidence: 99%
“…Some work discussed how to evaluate the quality of generated text in the reference-free setting (Louis and Nenkova, 2013;Peyrard et al, 2017;Peyrard and Gurevych, 2018;Shimanaka et al, 2018;Xenouleas et al, 2019;Sun and Nenkova, 2019;Böhm et al, 2019;Chen et al, 2018;Gao et al, 2020). Louis and Nenkova (2013), Peyrard et al (2017) and Peyrard and Gurevych (2018) leveraged regression models to fit human judgement.…”
Section: Reference-free Metricsmentioning
confidence: 99%
“…In contrast, our method is unsupervised and requires no human ratings for training. Sun and Nenkova (2019) discussed both reference-based and reference-free settings for summarization evaluation. Their method basically converts both the generated text and the text for comparison (denoted as T) into hidden representations using encoders like ELMo (Peters et al, 2018) and calculates the cosine similarity between them, T in the reference-based setting and the referencefree setting stands for the human-authored reference text and the source document text, respectively.…”
Section: Reference-free Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…measuring how much salient information from the source documents is covered by the summaries. There exist a few unsupervised evaluation methods (Louis and Nenkova, 2013;Sun and Nenkova, 2019), but they have low correlation with human relevance ratings at summary level: given multiple summaries for the same source documents, these methods can hardly distinguish summaries with high relevance from those with low relevance (see §3).…”
Section: Introductionmentioning
confidence: 99%