Towards Quantifiable Dialogue Coherence Evaluation

Zheng, Yi; Lu, Liucun; Huang, Lishan; Li, Lin; Liang, Xiaodan

doi:10.48550/arxiv.2106.00507

Cited by 1 publication

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baseline We compare our evaluation metrics with eleven popular automatic dialogue evaluation metrics, including three lexical word-overlap metrics: BLEU, ROUGE, and METEOR (Banerjee and Lavie 2005), five metrics that consider semantic representation: BERTScore, ADEM (Lowe et al 2017), BERT-RUBER, BLEURT, QuantiDCE (Ye et al 2021), two metrics that take into account additional information about the dialogue: DynaEval, GRADE, and Chat-GPT. Evaluation The common practice to show the effectiveness of a dialogue evaluation metric is to calculate the correlation between the model-predicted and the humanrated scores (Zhang et al 2021;Huang et al 2020).…”

Section: Experiments Experimental Setupmentioning

confidence: 99%

“…Dialogue coherence evaluation is essential for research on open dialogue systems, which refers to the coherence and consistency of the content and structure of the dialogue (See et al 2019;Ye et al 2021). Dialogues exhibit higher coherence when the responses are fluent in the language, clear in meaning, context-sensitive, and logically tight.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dialogues Are Not Just Text: Modeling Cognition for Dialogue Coherence Evaluation

Li,

Su,

Yang

et al. 2024

AAAI

View full text Add to dashboard Cite

The generation of logically coherent dialogues by humans relies on underlying cognitive abilities. Based on this, we redefine the dialogue coherence evaluation process, combining cognitive judgment with the basic text to achieve a more human-like evaluation. We propose a novel dialogue evaluation framework based on Dialogue Cognition Graph (DCGEval) to implement the fusion by in-depth interaction between cognition modeling and text modeling. The proposed Abstract Meaning Representation (AMR) based graph structure called DCG aims to uniformly model four dialogue cognitive abilities. Specifically, core-semantic cognition is modeled by converting the utterance into an AMR graph, which can extract essential semantic information without redundancy. The temporal and role cognition are modeled by establishing logical relationships among the different AMR graphs. Finally, the commonsense knowledge from ConceptNet is fused to express commonsense cognition. Experiments demonstrate the necessity of modeling human cognition for dialogue evaluation, and our DCGEval presents stronger correlations with human judgments compared to other state-of-the-art evaluation metrics.

show abstract