Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2018
DOI: 10.1145/3219819.3220028
|View full text |Cite
|
Sign up to set email alerts
|

Offline Evaluation of Ranking Policies with Click Models

Abstract: Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
48
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 44 publications
(48 citation statements)
references
References 33 publications
0
48
0
Order By: Relevance
“…In other cases, simplifying assumptions can be deployed to reduce the variance of IPS estimators in slate recommendation. Li et al assume that the reward for each sub-action is independent of other sub-actions in the slate [15]; this is a typical assumption to greatly reduce the size of the action space that is used in practical applications [5]. To simplify slate reward estimation, Swaminathan et al take an additive approach to the slate reward and assume that the sub-action rewards are unobserved and independent [26].…”
Section: Traditional Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In other cases, simplifying assumptions can be deployed to reduce the variance of IPS estimators in slate recommendation. Li et al assume that the reward for each sub-action is independent of other sub-actions in the slate [15]; this is a typical assumption to greatly reduce the size of the action space that is used in practical applications [5]. To simplify slate reward estimation, Swaminathan et al take an additive approach to the slate reward and assume that the sub-action rewards are unobserved and independent [26].…”
Section: Traditional Methodsmentioning
confidence: 99%
“…Independent IPS (IIPS): Proposed by Li et al[15], the estimator treats the reward at each position independently. The average reward of the target policy is estimated using Eq.…”
mentioning
confidence: 99%
“…Offline unbiased estimator. Since the historical logged data is generated by a logging production policy, we propose an unbiased offline estimator, User Browsing Inverse Propensity Scoring (UBM-IPS) estimator, to evaluate the performance inspired by Li et al [15]. The idea is to estimate users' clicks on various positions based on UBM model and the detail of reduction is at Appendix.…”
Section: Metrics and Evaluation Methodsmentioning
confidence: 99%
“…Komiyama et al [19] and subsequently Lagrée et al [20] use such propensities to find the optimal ranking for a single query by casting the ranking problem as a multiple-play bandit. Li et al [21] use similar propensities to counterfactually evaluate ranking policies where they estimate the number of clicks a ranking policy will receive. Our policy-aware approach contrasts with these existing methods by providing an unbiased estimate of LTR-metric-based losses, and thus it can be used to optimize LTR models similar to supervised LTR.…”
Section: Related Workmentioning
confidence: 99%