Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) 2022
DOI: 10.18653/v1/2022.gem-1.7
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting text decomposition methods for NLI-based factuality scoring of summaries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…We denote those scores as factual-consistent, non-verified, and factual-inconsistent, respectively. This strategy seeks to address the shortcomings of traditional factuality metrics (Wang et al, 2020;Honovich et al, 2021;Glover et al, 2022a;Lee et al, 2023) that mainly depend on consistency with human-annotated references. These metrics often fail in emerging knowledge generation scenarios (Table 10), as they struggle with modelgenerated content beyond reference knowledge scope and face difficulties when references are unavailable in real-world applications.…”
Section: Intrinsic Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…We denote those scores as factual-consistent, non-verified, and factual-inconsistent, respectively. This strategy seeks to address the shortcomings of traditional factuality metrics (Wang et al, 2020;Honovich et al, 2021;Glover et al, 2022a;Lee et al, 2023) that mainly depend on consistency with human-annotated references. These metrics often fail in emerging knowledge generation scenarios (Table 10), as they struggle with modelgenerated content beyond reference knowledge scope and face difficulties when references are unavailable in real-world applications.…”
Section: Intrinsic Evaluationmentioning
confidence: 99%
“…We chose the factCC version as our baseline. NLI-decompose-claim (Glover et al, 2022b) found that in general, sentence-level decomposition is preferable for the hypothesis side of the NLI input. So we also decompose the generated knowledge into sentences and then aggregate the sentence-level scores to produce a document-level score.…”
Section: Datasetmentioning
confidence: 99%
See 1 more Smart Citation