The World Wide Web Conference 2019
DOI: 10.1145/3308558.3313616
|View full text |Cite
|
Sign up to set email alerts
|

Policy Gradients for Contextual Recommendations

Abstract: Decision making is a challenging task in online recommender systems. The decision maker often needs to choose a contextual item at each step from a set of candidates. Contextual bandit algorithms have been successfully deployed to such applications, for the tradeoff between exploration and exploitation and the state-of-art performance on minimizing online costs. However, the applicability of existing contextual bandit methods is limited by the over-simplified assumptions of the problem, such as assuming a simp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2019
2019
2025
2025

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 30 publications
(18 citation statements)
references
References 14 publications
0
18
0
Order By: Relevance
“…There are two kinds of methods to solve the cold-start problems. The first type actively solves cold-start by designing a decision making strategy, such as using contextual-bandits [14,20].…”
Section: Cold-start Recommendationmentioning
confidence: 99%
“…There are two kinds of methods to solve the cold-start problems. The first type actively solves cold-start by designing a decision making strategy, such as using contextual-bandits [14,20].…”
Section: Cold-start Recommendationmentioning
confidence: 99%
“…There are two kinds of methods to solve the cold-start problems. The first type actively solves cold-start by designing a decision making strategy, such as using contextual-bandits [13,19].…”
Section: Cold-start Recommendationmentioning
confidence: 99%
“…The present cold-start problem is mainly studied in two directions. Some works tried to solve cold-start problems in the perspective of bandits [13]- [15]. A typical example is that Stephane et al embedded the preference of new users in a social network through utilizing the multi-armed bandits [13].…”
Section: Related Workmentioning
confidence: 99%