Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.441
|View full text |Cite
|
Sign up to set email alerts
|

DynaEval: Unifying Turn and Dialogue Level Evaluation

Abstract: A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval 1 , a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
38
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 23 publications
(40 citation statements)
references
References 47 publications
2
38
0
Order By: Relevance
“…For system development, we rely on ADMs for model design, hyperparameter tuning and system benchmarking (Yeh, Eskenazi, and Mehri 2021). The current trend of opendomain ADMs is shifting from the reference-based approach towards the model-based approach that is referencefree (Mehri and Eskenazi 2020a;Zhang et al 2021a). In many ADM solutions, we predict the relatedness between a dialogue context and the generated responses by training a discriminative network to distinguish the original response from negative samples in a self-supervised fashion.…”
Section: Related Work Dialogue Evaluation Metricsmentioning
confidence: 99%
“…For system development, we rely on ADMs for model design, hyperparameter tuning and system benchmarking (Yeh, Eskenazi, and Mehri 2021). The current trend of opendomain ADMs is shifting from the reference-based approach towards the model-based approach that is referencefree (Mehri and Eskenazi 2020a;Zhang et al 2021a). In many ADM solutions, we predict the relatedness between a dialogue context and the generated responses by training a discriminative network to distinguish the original response from negative samples in a self-supervised fashion.…”
Section: Related Work Dialogue Evaluation Metricsmentioning
confidence: 99%
“…Discourse coherence is a broad area of research and in dialog evaluation, we try to assess coherence at different granularity. One is the coherence of the entire dialog flow (Zhang et al, 2021a) and the other is the local coherence at the turn-level, i.e., context-response coherence (Cervone and Riccardi, 2020). In our study, we focus on the local coherence assessment and adopt a simple metric (CoSim) to evaluate the coherence between the context, p, and the response, h:…”
Section: Context-response Coherencementioning
confidence: 99%
“…A solution for better automatic evaluation methods is to train reference-free evaluators that learn how to assess the generated responses given dialog contexts from different aspects such as relevancy (Tao et al, 2018;Ghazarian et al, 2019;Lan et al, 2020), engagement (Ghazarian et al, 2020), fluency (Zhang et al, 2021b;Pang et al, 2020), contradiction (Pang et al, 2020;Nie et al, 2021) amongst others. It is also important to get some holistic evaluation at the dialog level in order to assess the dialogs as a whole (Zhang et al, 2021a;Mehri and Eskenazi, 2020;Finch et al, 2021).…”
Section: Introductionmentioning
confidence: 99%