Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining 2019
DOI: 10.1145/3289600.3290999
|View full text |Cite
|
Sign up to set email alerts
|

Top-K Off-Policy Correction for a REINFORCE Recommender System

Abstract: Industrial recommender systems deal with extremely large action spaces -many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learning. Learning from the logged feedback is however subject to biases caused by only observing feedback on recommendations selected by the previous versions of the recommender. In … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
309
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 341 publications
(309 citation statements)
references
References 38 publications
0
309
0
Order By: Relevance
“…Building real-world recommenders face a variety of challenges. Two that relate to the challenges in fairness are the temporal dynamics [33,48,26,9] and biased training data [29,15,3]. These issues do not just make training difficult but also evaluation of recommender performance [42].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Building real-world recommenders face a variety of challenges. Two that relate to the challenges in fairness are the temporal dynamics [33,48,26,9] and biased training data [29,15,3]. These issues do not just make training difficult but also evaluation of recommender performance [42].…”
Section: Related Workmentioning
confidence: 99%
“…We consider a production recommender system that is recommending a personalized list of K items to users. We consider a cascading recommender [47,24,16], with a set of retrieval systems [15] followed by a ranking system [16,36]. We assume that the retrieval systems return a set R of M relevant items from the total corpus J of M items, where M M ≥ K. The ranking model must then score and rank M items in R to get a final ranking of K items.…”
Section: Recommendation Environmentmentioning
confidence: 99%
“…Other enhancements include incorporating contextual data [5]. Most recently, Chen et al [10] and Ie et al [23] showed success in applying reinforcement learning techniques in YouTube recommender systems. Our work does not deal with designing a recommender system, nor does it attempt to reverse engineer the YouTube recommender.…”
Section: Recommender Systems and Video Recommendationmentioning
confidence: 99%
“…Contrasting the extensive literature on evaluating the accuracy of recommendation [5,10,30,54], we focus on prior work that connects network structure with content consumption. Carmi et al [8] reported how the book sales on Amazon react to exogenous demand shocks -not only did the sales increase for the featured item, but the increase also propagated a few hops away by following the links created by the recommender systems.…”
Section: Measuring the Effects Of Recommender Systemsmentioning
confidence: 99%
See 1 more Smart Citation