Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.160
|View full text |Cite
|
Sign up to set email alerts
|

How to Evaluate a Summarizer: Study Design and Statistical Analysis for Manual Linguistic Quality Evaluation

Abstract: Manual evaluation is essential to judge progress on automatic text summarization. However, we conduct a survey on recent summarization system papers that reveals little agreement on how to perform such evaluation studies. We conduct two evaluation experiments on two aspects of summaries' linguistic quality (coherence and repetitiveness) to compare Likert-type and ranking annotations and show that best choice of evaluation method can vary from one aspect to another. In our survey, we also find that study parame… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 43 publications
1
5
0
Order By: Relevance
“…We follow Steen and Markert (2021) and report Krippendorff's alpha and Split-Half Reliability as measures of the reliability of crowdsourced annotations. Krippendorff's alpha (α) is a reliability coefficient developed to measure the agreement among multiple annotators (Krippendorff, 2011).…”
Section: Reliabilitymentioning
confidence: 99%
See 4 more Smart Citations
“…We follow Steen and Markert (2021) and report Krippendorff's alpha and Split-Half Reliability as measures of the reliability of crowdsourced annotations. Krippendorff's alpha (α) is a reliability coefficient developed to measure the agreement among multiple annotators (Krippendorff, 2011).…”
Section: Reliabilitymentioning
confidence: 99%
“…We follow a similar block-design described in Steen and Markert (2021). We note that we include the input document as the context of the summaries as opposed to the coherence and repetition dimensions studied in that work, which do not require reading the input article.…”
Section: Reliabilitymentioning
confidence: 99%
See 3 more Smart Citations