Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2019
DOI: 10.1145/3292500.3330955
|View full text |Cite
|
Sign up to set email alerts
|

Assessing The Factual Accuracy of Generated Text

Abstract: We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
120
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 137 publications
(122 citation statements)
references
References 27 publications
1
120
0
1
Order By: Relevance
“…Recent work in addressing faithfulness of text generations can be broadly divided into three groups: structured information based, multi-task formulations, and post-processing methods. The first group leverages structured knowledge, like Open IE triples (Cao et al, 2018;Goodrich et al, 2019), dependency trees (Song et al, 2018), or generated semantic roles (Fan et al, 2018) as additional input for generation. However, incorporation of these as additional embeddings in model architectures does not explain how these influence model generations.…”
Section: Related Workmentioning
confidence: 99%
“…Recent work in addressing faithfulness of text generations can be broadly divided into three groups: structured information based, multi-task formulations, and post-processing methods. The first group leverages structured knowledge, like Open IE triples (Cao et al, 2018;Goodrich et al, 2019), dependency trees (Song et al, 2018), or generated semantic roles (Fan et al, 2018) as additional input for generation. However, incorporation of these as additional embeddings in model architectures does not explain how these influence model generations.…”
Section: Related Workmentioning
confidence: 99%
“…Then they train a fact-checking model to classify the label of the claim and extract spans in both the source document and the generated summary explaining the model's decision. Goodrich et al (2019) introduced a model-based…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, Goodrich et al (2019) compare several models such as relation extraction, binary classification and end-to-end models (E2E) for estimating factual accuracy on a Wikipedia text summarization task. They show that their E2E model for factual correctness has the highest correlation with human judgements and suggest that the E2E models could benefit from a better labeling scheme.…”
Section: Factual Errors In Summariesmentioning
confidence: 99%
“…We find that previous research has not established a detailed typology of summarization errors. Most work instead relies on on a binary distinction between correct and erroneous (Cao et al, 2018;Falke et al, 2019;Lebanoff et al, 2019) or faithfulness measured on a Likert scale (Goodrich et al, 2019). However, not all errors are created equal.…”
Section: Factual Errors In Summariesmentioning
confidence: 99%