2021
DOI: 10.48550/arxiv.2110.08222
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DialFact: A Benchmark for Fact-Checking in Dialogue

Abstract: Fact-checking is an essential tool to mitigate the spread of misinformation and disinformation, however, it has been often explored to verify formal single-sentence claims instead of casual conversational claims. To study the problem, we introduce the task of fact-checking in dialogue. We construct DIALFACT, a testing benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia. There are three sub-tasks in DIALFACT: 1) Verifiable claim detection task distinguishes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…Honovich et al [72] present a trainable metric for the KGD task, which also applies NLI. It is also noteworthy that Gupta et al [66] propose datasets that can benefit fact-checking systems specialized for dialogue systems. Conv-FEVER corpus [154] is a factual consistency detection dataset, which is created by adapting from Wizard-of-Wikipedia dataset [31].…”
Section: Hallucination Metrics For Generation-based Dialogue Systems ...mentioning
confidence: 99%
See 1 more Smart Citation
“…Honovich et al [72] present a trainable metric for the KGD task, which also applies NLI. It is also noteworthy that Gupta et al [66] propose datasets that can benefit fact-checking systems specialized for dialogue systems. Conv-FEVER corpus [154] is a factual consistency detection dataset, which is created by adapting from Wizard-of-Wikipedia dataset [31].…”
Section: Hallucination Metrics For Generation-based Dialogue Systems ...mentioning
confidence: 99%
“…Fact-checking in dialogue systems. In addition to the factual consistency in responses from knowledge grounded dialogue systems, fact-checking in dialogue systems is a future direction of dealing with the hallucination problem in dialogue system [66]. The dialogue fact-checking involves verifiable claim detection, which is an important line of distinguishing hallucination-prone dialogue, and evidence retrieval from an external source.…”
Section: Future Directions In Dialogue Generationmentioning
confidence: 99%
“…Hallucination Evaluation. Recently introduced benchmarks can serve as testbeds for knowledge grounding in dialogue systems, such as BEGIN (Dziri et al, 2021b), DialFact (Gupta et al, 2021), and Attributable to Identified Sources (AIS) framework (Rashkin et al, 2021a). Meanwhile, a recent study has reopened the question of the most reliable metric for automatic evaluation of hallucinationfree models, with the Q 2 metric (Honovich et al, 2021) showing performance comparable to human annotation.…”
Section: Related Workmentioning
confidence: 99%
“…This enables a more well-defined task, since determining the truthfulness of a fact w.r.t a Task # Examples Open Test Cons. Summarization -FRANK (Pagnoni et al, 2021) 671 + 33.2% -SummEval (Fabbri et al, 2021a) 1,600 -81.6% -MNBM (Maynez et al, 2020) 2,500 -10.2% -QAGS-CNNDM 235 -48.1% -QAGS-XSum 239 -48.5% Dialogue -BEGIN (Dziri et al, 2021) 836 + 33.7% -Q 2 (Honovich et al, 2021) 1,088 -57.7% -DialFact (Gupta et al, 2021) 8,689 + 38.5% Fact Verification -FEVER (Thorne et al, 2018) 18,209 -35.1% -VitaminC (Schuster et al, 2021) 63,054 + 49.9% Paraphrasing -PAWS (Zhang et al, 2019) 8,000 + 44.2% general "real world" is subjective and depends on the knowledge, values and beliefs of the subject (Heidegger, 2001). This definition follows similar strictness in Textual Entailment, Question Answering, Summarization and other tasks where comprehension is based on a given grounding text, irrespective of contradiction with other world knowledge.…”
Section: Definitions and Terminologymentioning
confidence: 99%
“…DialFact Gupta et al (2021) introduced the task of fact-verification in dialogue and constructed a dataset of conversational claims paired with pieces of evidence from Wikipedia. They define three tasks: (1) detecting whether a response contains verifiable content (2) retrieving relevant evidence and (3) predicting whether a response is supported by the evidence, refuted by the evidence or if there is not enough information to determine.…”
Section: Dialogue Generationmentioning
confidence: 99%