2020
DOI: 10.1609/aaai.v34i05.6283
|View full text |Cite
|
Sign up to set email alerts
|

Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems

Abstract: User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, predictive engagement, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
49
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(50 citation statements)
references
References 18 publications
1
49
0
Order By: Relevance
“…Previous research also confirms that incorporating user engagement as real-time feedback benefits dialog policy learning (Yu et al, 2016). One of the most costly bottlenecks of learning to detect user disengagement is to annotate many turn-level user engagement labels (Ghazarian et al, 2020). In addition, the data annotation process becomes more expensive and challenging for privacy-sensitive dialog corpora, due to the privacy concerns in crowdsourcing (Xia and McKernan, 2020).…”
Section: Introductionmentioning
confidence: 61%
See 2 more Smart Citations
“…Previous research also confirms that incorporating user engagement as real-time feedback benefits dialog policy learning (Yu et al, 2016). One of the most costly bottlenecks of learning to detect user disengagement is to annotate many turn-level user engagement labels (Ghazarian et al, 2020). In addition, the data annotation process becomes more expensive and challenging for privacy-sensitive dialog corpora, due to the privacy concerns in crowdsourcing (Xia and McKernan, 2020).…”
Section: Introductionmentioning
confidence: 61%
“…Example proxy metrics include conversation length like number of dialog turns , and conversational breadth like topical diversity . Sporadic attempts have been made to detecting user disengagement in dialogs (Yu et al, 2004;Ghazarian et al, 2020;Choi et al, 2019). A major bottleneck of these methods is that they require hand-labeling many dialog samples for individual datasets.…”
Section: User Engagement In Dialogsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use the contextualized Ruber metric for this purpose (Ghazarian et al, 2019). At the end, since in open-domain dialogue systems, it is necessary to have both relevant and interesting responses to make the user feel satisfied (Ghazarian et al, 2020), we further validate systems based on the engagingness of responses. We compute engagingness as the probability score of the engaging class predicted by Ghazarian et al (2020)'s proposed engagement classifier.…”
Section: Automatic Evaluationsmentioning
confidence: 99%
“…They are able to assess language fluency and context coherence of dialogue responses to certain extent. However, they are limited when assessing other aspects such as logical consistency, semantic appropriateness [11] and user engagement [12] systematically. For machine evaluation to approach human performance, we identify two research problems: a) to define evaluation metrics that describe different dialogue aspects, and b) to establish a holistic solution that considers the inter-dependence of the different aspects.…”
Section: Introductionmentioning
confidence: 99%