2020
DOI: 10.48550/arxiv.2011.04096
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…Further, ROUGE does not work when reference summaries are not available. During the last two years, there has been a spurt in research related to metrics for summary quality (Peyrard, 2019b;Bhandari et al, 2020a;Huang et al, 2020;Vasilyev & Bohannon, 2020;Fabbri et al, 2020;Bhandari et al, 2020b). Most of these works have argued against the ROUGE metric because it fails to robustly match paraphrases resulting in misleading scores, which do not correlate well with human judgements (Zhang et al, 2019;Huang et al, 2020).…”
Section: Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…Further, ROUGE does not work when reference summaries are not available. During the last two years, there has been a spurt in research related to metrics for summary quality (Peyrard, 2019b;Bhandari et al, 2020a;Huang et al, 2020;Vasilyev & Bohannon, 2020;Fabbri et al, 2020;Bhandari et al, 2020b). Most of these works have argued against the ROUGE metric because it fails to robustly match paraphrases resulting in misleading scores, which do not correlate well with human judgements (Zhang et al, 2019;Huang et al, 2020).…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…Further, recent debate and consequent surge in study of evaluation metrics for automatic summaries is a clear and strong testimony to the considerable complexity of the task (Peyrard, 2019a,b;Ermakova et al, 2019;Bhandari et al, 2020a;Vasilyev & Bohannon, 2020;Fabbri et al, 2020;Huang et al, 2020;Bhandari et al, 2020b).…”
Section: Introductionmentioning
confidence: 99%