Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023
DOI: 10.18653/v1/2023.emnlp-main.621
|View full text |Cite
|
Sign up to set email alerts
|

Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models

Xiaolei Wang,
Xinyu Tang,
Xin Zhao
et al.

Abstract: The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs), which rely on natural language conversations to satisfy user needs. In this paper, we embark on an investigation into the utilization of ChatGPT for CRSs, revealing the inadequacy of the existing evaluation protocol. It might overemphasize the matching with ground-truth items annotated by humans while neglecting the interactive nature of CRSs.To overcome the limitatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(1 citation statement)
references
References 23 publications
0
1
0
Order By: Relevance
“…However, the study also found that GPT-4 is less proficient in tasks that require complex reasoning or specific domain knowledge, highlighting the limitations of these models [24]. Recent research has addressed various limitations of large language models, including the hand-crafting of task-specific demonstrations [25], the evaluation of code synthesis [26], the cost barrier associated with large models [27], the evaluation protocol for conversational recommendation systems [28], and the context window restriction for off-the-shelf LLMs [29].…”
Section: Foundation Models and Artificial General Intelligence (Agi)mentioning
confidence: 99%
“…However, the study also found that GPT-4 is less proficient in tasks that require complex reasoning or specific domain knowledge, highlighting the limitations of these models [24]. Recent research has addressed various limitations of large language models, including the hand-crafting of task-specific demonstrations [25], the evaluation of code synthesis [26], the cost barrier associated with large models [27], the evaluation protocol for conversational recommendation systems [28], and the context window restriction for off-the-shelf LLMs [29].…”
Section: Foundation Models and Artificial General Intelligence (Agi)mentioning
confidence: 99%