2022
DOI: 10.1002/aaai.12051
|View full text |Cite
|
Sign up to set email alerts
|

Offline recommender system evaluation: Challenges and new directions

Abstract: Offline evaluation is an essential complement to online experiments in the selection, improvement, tuning, and deployment of recommender systems. Offline methodologies for recommender system evaluation evolved from experimental practice in Machine Learning (ML) and Information Retrieval (IR). However, evaluating recommendations involves particularities that pose challenges to the assumptions upon which the ML and IR methodologies were developed. We recap and reflect on the development and current status of rec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(14 citation statements)
references
References 75 publications
0
4
0
Order By: Relevance
“…1 Supplemental material, including source code, scripts, datasets, and results are permanently placed in: https://github.com/recsyspolimi/recsys-2022-evaluation-of-recsyswith-impressions 2 Further studies may select different evaluation methodologies depending on their context, see Castells and Moffat [6].…”
Section: Experimental Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…1 Supplemental material, including source code, scripts, datasets, and results are permanently placed in: https://github.com/recsyspolimi/recsys-2022-evaluation-of-recsyswith-impressions 2 Further studies may select different evaluation methodologies depending on their context, see Castells and Moffat [6].…”
Section: Experimental Methodologymentioning
confidence: 99%
“…However, this is rapidly changing as more impression datasets have been published and competitions encouraged their use. For instance: (i) interest in impressions research and use has risen [6,22], (ii) several RS challenges included impressions in their datasets, (iii) industries have presented case studies highlighting the effects of impressions data in their recommendations [4,19], and (iv) three open datasets were recently-published ContentWise Impressions [26], MIND [32], and FINN.no Slates [10,11]. Due to the limited use of impressions in RS, there exist several open research questions, e.g., whether to evaluate recommendation models with impressions data requires to develop a specific methodology as is the case in other scenarios [12,14], characterization of signals and biases in impressions, and challenges regarding the use of impressions.…”
Section: Introductionmentioning
confidence: 99%
“…Evaluating recommendation systems is a crucial stage in validating their usage. Consequently, several studies have investigated the impact of time on online evaluations [31] [32] [33] because this temporal factor aids in interpreting the acceptance of the obtained recommendation results [34]. Online evaluation is the best method to evaluate RSs and CARS.…”
Section: Literature Reviewmentioning
confidence: 99%
“…However, it can be challenging to control external factors and account for user biases. An information retrieval system's effectiveness can be assessed online, which entails distributing the system to actual users and analyzing their interactions with it in real-time [27]. There is not much information on how deploying a hybrid recommender system relates to the evaluation.…”
Section: ) Offline Evaluationmentioning
confidence: 99%