2022
DOI: 10.48550/arxiv.2204.10757
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Abstract: The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. Dziri et al. ( 2022)'s investigation of hallucinations has revealed that existing knowledgegrounded benchmarks are contaminated with hallucinated responses at an alarming level (>60% of the responses) and models trained on this data amplify hallucinations even further… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 30 publications
0
9
0
Order By: Relevance
“…Previous work reported the percentage of novel n-grams or the notion of Coverage (Grusky et al, 2018) as a proxy for abstractiveness. These metrics have been adopted in other areas such as dialog (Dziri et al, 2022) to inspect the qualities and characteristics of datasets. Despite being convenient, these measures do not enable fine-grained analyses of multi-sentence aggregation.…”
Section: Measuring Aggregationmentioning
confidence: 99%
“…Previous work reported the percentage of novel n-grams or the notion of Coverage (Grusky et al, 2018) as a proxy for abstractiveness. These metrics have been adopted in other areas such as dialog (Dziri et al, 2022) to inspect the qualities and characteristics of datasets. Despite being convenient, these measures do not enable fine-grained analyses of multi-sentence aggregation.…”
Section: Measuring Aggregationmentioning
confidence: 99%
“…Advancements in adding a moral dimension to KGs, and extending them with intuition of morality (such as crime is bad), can enable generation of morally correct knowledge paths. Furthermore, imbuing conversational systems with empathy (Ma et al, 2020), moral discretion (Ziems et al, 2022) and factual correctness (Gupta et al, 2021b;Dziri et al, 2022) will improve users' experience and trust in the system.…”
Section: Ethics and Broader Impactmentioning
confidence: 99%
“…experimented with an Encoder-Decoder model on multiple datasets but ignored the Dual-Encoders are more compatible with the task. Dziri et al (2022a) experimented on sentencelevel knowledge datasets, neglecting the effect of knowledge size and granularity. Nevertheless, comparisons from multiple perspectives are needed to support the superiority and contribution of the model theoretically.…”
Section: Introductionmentioning
confidence: 99%