Findings of the Association for Computational Linguistics: EMNLP 2023 2023
DOI: 10.18653/v1/2023.findings-emnlp.137
|View full text |Cite
|
Sign up to set email alerts
|

MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings

Amila Ferron,
Amber Shore,
Ekata Mitra
et al.

Abstract: As dialogue systems become more popular, evaluation of their response quality gains importance. Engagingness highly correlates with overall quality and creates a sense of connection that gives human participants a more fulfilling experience. Although qualities like coherence and fluency are readily measured with well-worn automatic metrics, evaluating engagingness often relies on human assessment, which is a costly and time-consuming process. Existing automatic engagingness metrics evaluate the response withou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…Despite its simplicity, LLM-EVal reportedly outperforms most baselines and stateof-the-art evaluation methods, including GPTScore. MEEP (Ferron et al, 2023) is another dialogue-specific evaluation method which directly uses the generated scores. Focusing on the Engagingness of a conversation (which shows a high correlation with the majority of other commonly desired conversational attributes), they provide the LLM with a detailed and multi-faceted description of response engagingness as the "variety of response according to the context, likelihood of encouraging the other participant to respond, likelihood of encouraging a quality response from the other participant, interestingness, specificity, and likelihood of creating a sense of belonging for the other participant.".…”
Section: Llms For Dialogue Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…Despite its simplicity, LLM-EVal reportedly outperforms most baselines and stateof-the-art evaluation methods, including GPTScore. MEEP (Ferron et al, 2023) is another dialogue-specific evaluation method which directly uses the generated scores. Focusing on the Engagingness of a conversation (which shows a high correlation with the majority of other commonly desired conversational attributes), they provide the LLM with a detailed and multi-faceted description of response engagingness as the "variety of response according to the context, likelihood of encouraging the other participant to respond, likelihood of encouraging a quality response from the other participant, interestingness, specificity, and likelihood of creating a sense of belonging for the other participant.".…”
Section: Llms For Dialogue Evaluationmentioning
confidence: 99%
“…Like any other prompt-based method, these approaches can be sensitive to the structure and content of the provided instruction, including the descriptions, examples and even the score range Ferron et al (2023); Lin and Chen (2023). Nonetheless, the fact that human-aligned LLMs can follow instructions and provide competitive assessments of arbitrary dialogue features is a significant achievement for the field.…”
Section: Llms For Dialogue Evaluationmentioning
confidence: 99%