MEEP: Is this Engaging? Prompting Large Language Models for Dialogue Evaluation in Multilingual Settings

Ferron, Amila; Shore, Amber; Mitra, Ekata; Agrawal, Ameeta

doi:10.18653/v1/2023.findings-emnlp.137

Cited by 1 publication

(2 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite its simplicity, LLM-EVal reportedly outperforms most baselines and stateof-the-art evaluation methods, including GPTScore. MEEP (Ferron et al, 2023) is another dialogue-specific evaluation method which directly uses the generated scores. Focusing on the Engagingness of a conversation (which shows a high correlation with the majority of other commonly desired conversational attributes), they provide the LLM with a detailed and multi-faceted description of response engagingness as the "variety of response according to the context, likelihood of encouraging the other participant to respond, likelihood of encouraging a quality response from the other participant, interestingness, specificity, and likelihood of creating a sense of belonging for the other participant.".…”

Section: Llms For Dialogue Evaluationmentioning

confidence: 99%

“…Like any other prompt-based method, these approaches can be sensitive to the structure and content of the provided instruction, including the descriptions, examples and even the score range Ferron et al (2023); Lin and Chen (2023). Nonetheless, the fact that human-aligned LLMs can follow instructions and provide competitive assessments of arbitrary dialogue features is a significant achievement for the field.…”

Section: Llms For Dialogue Evaluationmentioning

confidence: 99%

See 1 more Smart Citation