2010
DOI: 10.17562/pb-42-2
|View full text |Cite
|
Sign up to set email alerts
|

Summary Evaluation with and without References

Abstract: We study a new content-based method for the evaluation of text summarization systems without human models which is used to produce system rankings. The research is carried out using a new content-based evaluation framework called FRESA to compute a variety of divergences among probability distributions. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0
3

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(20 citation statements)
references
References 20 publications
0
17
0
3
Order By: Relevance
“…Pyramid (Nenkova and Passonneau, 2004) and Responsiveness are the most popular such methods. The second group of methods is divided itself into two subsets: (1) methods that need human intervention like ROUGE (Lin, 2004a) and SERA (Cohan and Goharian, 2016), and (2) methods that do not need any human reference like SummTriver (Cabrera-Diego and Torres-Moreno, 2018) and FRESA (Torres-Moreno et al, 2010). The most popular automatic metric used by the community is ROUGE (Lin, 2004a).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Pyramid (Nenkova and Passonneau, 2004) and Responsiveness are the most popular such methods. The second group of methods is divided itself into two subsets: (1) methods that need human intervention like ROUGE (Lin, 2004a) and SERA (Cohan and Goharian, 2016), and (2) methods that do not need any human reference like SummTriver (Cabrera-Diego and Torres-Moreno, 2018) and FRESA (Torres-Moreno et al, 2010). The most popular automatic metric used by the community is ROUGE (Lin, 2004a).…”
Section: Related Workmentioning
confidence: 99%
“…Evaluation is usually done by humans, but manual evaluation is subjective, costly and time expensive (Lin and Hovy, 2002). Automatic evaluation methods (Lin, 2004a;Torres-Moreno et al, 2010;Zhao et al, 2019;Zhang et al, 2020) are an alternative to save time for users who extract the most relevant content from the web using Automatic Text Summarization systems (ATS). There exist two types of evaluation approaches: (1) manual evaluation methods like Pyramid (Nenkova and Passonneau, 2004) and Responsiveness, where human intervention is mandatory, and (2) automatic evaluation methods, where human intervention can be needed as a ground-truth reference (Lin, 2004a;Cohan and Goharian, 2016) or not (Torres-Moreno et al, 2010;Cabrera-Diego and Torres-Moreno, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…where 𝑃 is the probability distribution of words 𝑤 in text 𝑇 and 𝑄 is the probability distribution of words 𝑤 in summary 𝑆; 𝑁 is the number of words in text and summary 𝑁 = 𝑁 𝑇 + 𝑁 𝑆 , 𝐵 = 1.5|𝑉| where V is the size of the vocabulary of the documents, 𝐶 𝑤 𝑇 is the number of words in the text and 𝐶 𝑤 𝑇 is the number of words in the summary. For smoothing the summary's probabilities, we have used δ = 0.005 [53].…”
Section: Jensen-shannon Divergence (Js)mentioning
confidence: 99%
“…FRESA (Torres-Moreno et al, 2010) is a framework for the evaluation of summaries that relies on the Jensen-Shannon divergence between n-gram probabilities. It scores summaries directly against the source text without reference summaries.…”
Section: Related Workmentioning
confidence: 99%