Proceedings of the ACM Web Conference 2024 2024
DOI: 10.1145/3589334.3645343
|View full text |Cite
|
Sign up to set email alerts
|

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

Haruka Kiyohara,
Masahiro Nomura,
Yuta Saito

Abstract: We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a policy selects multi-dimensional actions known as slates. This problem is widespread in recommender systems, search engines, marketing, to medical applications, however, the typical Inverse Propensity Scoring (IPS) estimator suffers from substantial variance due to large action spaces, making effective OPE a significant challenge. The PseudoInverse (PI) estimator has been introduced to mitigate the variance issue by assumin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 33 publications
0
0
0
Order By: Relevance