Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.742
|View full text |Cite
|
Sign up to set email alerts
|

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Abstract: Automatically evaluating dialogue coherence is a challenging but high-demand ability for developing high-quality open-domain dialogue systems. However, current evaluation metrics consider only surface features or utterancelevel semantics, without explicitly considering the fine-grained topic transition dynamics of dialogue flows. Here, we first consider that the graph structure constituted with topics in a dialogue can accurately depict the underlying communication logic, which is a more natural way to produce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
42
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 38 publications
(42 citation statements)
references
References 21 publications
0
42
0
Order By: Relevance
“…Moreover, learnable metrics encoding the semantic information have been attracting interests recently, which are trained in a supervised manner with large-scale human-annotated data, such as ADEM (Lowe et al, 2017), or trained in an unsupervised manner with automatically constructed data, such as RUBER (Tao et al, 2018) and BERT-RUBER (Ghazarian et al, 2019). Furthermore, the recently proposed coherence metric, GRADE (Huang et al, 2020), introduces the graph information of dialogue topic transitions and achieves the current state-of-the-art results. Note that these learnable metrics are trained in a two-level training objective to separate the coherent dialogues from the incoherent ones, while our QuantiDCE models the task in a multi-level setting which is closer to the actual human rating.…”
Section: Related Workmentioning
confidence: 90%
See 2 more Smart Citations
“…Moreover, learnable metrics encoding the semantic information have been attracting interests recently, which are trained in a supervised manner with large-scale human-annotated data, such as ADEM (Lowe et al, 2017), or trained in an unsupervised manner with automatically constructed data, such as RUBER (Tao et al, 2018) and BERT-RUBER (Ghazarian et al, 2019). Furthermore, the recently proposed coherence metric, GRADE (Huang et al, 2020), introduces the graph information of dialogue topic transitions and achieves the current state-of-the-art results. Note that these learnable metrics are trained in a two-level training objective to separate the coherent dialogues from the incoherent ones, while our QuantiDCE models the task in a multi-level setting which is closer to the actual human rating.…”
Section: Related Workmentioning
confidence: 90%
“…Therefore, in this work, we set the number of coherence levels L = 3 where the pairs containing the random responses, the adversarial responses and the reference responses respectively belong to the levels from 1 to 3. As to the fine-tuning data, we use the DailyDialog human judgement dataset, denoted as DailyDialogEVAL, which is a subset of the adopted evaluation benchmark (Huang et al, 2020), with 300 human rating data in total, and randomly split the data into training (90%) and validation (10%) sets. Implementation Details.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Context-Aware NMT. In a sense, chat MT can be viewed as a special case of context-aware MT that has many related studies (Gong et al, 2011;Jean et al, 2017;Wang et al, 2017b;Zheng et al, 2020;Yang et al, 2019;Kang et al, 2020;Ma et al, 2020). Typically, they resort to extending conventional NMT models for exploiting the context.…”
Section: Related Workmentioning
confidence: 99%
“…We follow previous work (Wang et al, 2017;Xu et al, 2019;Huang et al, 2020) to optimize the utterance-pair coherence scoring model (described in Section 3.2) with marginal ranking loss. Formally, the coherence scoring model CS receives two utterances (u 1 , u 2 ) as input and return the coherence score c = CS(u 1 , u 2 ), which reflects the topical relevance of this pair of utterances.…”
Section: Training Data For Coherence Scoringmentioning
confidence: 99%