Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.124
|View full text |Cite
|
Sign up to set email alerts
|

SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization

Abstract: We study unsupervised multi-document summarization evaluation metrics, which require neither human-written reference summaries nor human annotations (e.g. preferences, ratings, etc.). We propose SUPERT, which rates the quality of a summary by measuring its semantic similarity with a pseudo reference summary, i.e. selected salient sentences from the source documents, using contextualized embeddings and soft token alignment techniques. Compared to the state-of-theart unsupervised evaluation metrics, SUPERT corre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
51
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 75 publications
(52 citation statements)
references
References 32 publications
(34 reference statements)
0
51
1
Order By: Relevance
“…We also test prior predictions from a state-ofthe-art summary scoring method, SUPERT (Gao et al, 2020), which uses a variant of BERT that has been fine-tuned on news articles to obtain 1024-dimensional contextualized embeddings of a summary. To score a summary, SUPERT extracts a pseudo-reference summary from the source documents, then compares its embedding with that of the test summary.…”
Section: Methodsmentioning
confidence: 99%
“…We also test prior predictions from a state-ofthe-art summary scoring method, SUPERT (Gao et al, 2020), which uses a variant of BERT that has been fine-tuned on news articles to obtain 1024-dimensional contextualized embeddings of a summary. To score a summary, SUPERT extracts a pseudo-reference summary from the source documents, then compares its embedding with that of the test summary.…”
Section: Methodsmentioning
confidence: 99%
“…Some work discussed how to evaluate the quality of generated text in the reference-free setting (Louis and Nenkova, 2013;Peyrard et al, 2017;Peyrard and Gurevych, 2018;Shimanaka et al, 2018;Xenouleas et al, 2019;Sun and Nenkova, 2019;Böhm et al, 2019;Chen et al, 2018;Gao et al, 2020). Louis and Nenkova (2013), Peyrard et al (2017) and Peyrard and Gurevych (2018) leveraged regression models to fit human judgement.…”
Section: Reference-free Metricsmentioning
confidence: 99%
“…SUPERT generates pseudo references and evaluates the quality of the test summaries by calculating word mover's distance between the pseudo reference summaries and the test summaries (Gao et al, 2020). It is similar to MoverScore (Zhao et al, 2019) which uses the human-authored references instead of pseudo references.…”
Section: Reference-free Metricsmentioning
confidence: 99%
“…The second approach extends the (single) reference into multiple ones, by automatically generating paraphrases of the reference (a.k.a pseudoreferences) (Albrecht and Hwa, 2008;Yoshimura et al, 2019;Kauchak and Barzilay, 2006;Edunov et al, 2018;Gao et al, 2020). Our method ( §3.3) follows this paradigm.…”
Section: Generation Evaluationmentioning
confidence: 99%