“…Despite its simplicity, LLM-EVal reportedly outperforms most baselines and stateof-the-art evaluation methods, including GPTScore. MEEP (Ferron et al, 2023) is another dialogue-specific evaluation method which directly uses the generated scores. Focusing on the Engagingness of a conversation (which shows a high correlation with the majority of other commonly desired conversational attributes), they provide the LLM with a detailed and multi-faceted description of response engagingness as the "variety of response according to the context, likelihood of encouraging the other participant to respond, likelihood of encouraging a quality response from the other participant, interestingness, specificity, and likelihood of creating a sense of belonging for the other participant.".…”