Proceedings of the Web Conference 2020 2020
DOI: 10.1145/3366423.3380130
|View full text |Cite
|
Sign up to set email alerts
|

Off-policy Learning in Two-stage Recommender Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(47 citation statements)
references
References 22 publications
0
47
0
Order By: Relevance
“…Ma et al [64] extends the policy correction gradient estimator into a two-stage setting which are 𝑝 (𝑠 𝑡 , 𝑎 𝑝 ) and 𝑞(𝑠 𝑡 , 𝑎|𝑎 𝑝 ) and the policy can be written as ∑︁…”
Section: Model-free Deep Reinforcement Learning Based Methodsmentioning
confidence: 99%
“…Ma et al [64] extends the policy correction gradient estimator into a two-stage setting which are 𝑝 (𝑠 𝑡 , 𝑎 𝑝 ) and 𝑞(𝑠 𝑡 , 𝑎|𝑎 𝑝 ) and the policy can be written as ∑︁…”
Section: Model-free Deep Reinforcement Learning Based Methodsmentioning
confidence: 99%
“…Practical applications, however, are still limited, especially for offline RL (such as [5,25,27,31,43,47]). We attribute this to the lack of application-specific simulation environments that provide useful insights for specific research questions.…”
Section: Towards Practical Research Of Offline Rl In Recsys and Rtbmentioning
confidence: 99%
“…We attribute this to the lack of application-specific simulation environments that provide useful insights for specific research questions. For example, RecSys/RTB are unique regarding their huge action space and highly stochastic and delayed rewards [4,25,55]. Therefore, we need to build a simulation platform imitating such specific characteristics to better understand the empirical performance of offline RL/OPE methods in these particular situations.…”
Section: Towards Practical Research Of Offline Rl In Recsys and Rtbmentioning
confidence: 99%
See 1 more Smart Citation
“…To solve this issue, we introduce gradient estimation based on Monte-Carlo sampling. Our approach is similar to that of Ma et al [15], however, we are estimating gradients of variance instead of general performance.…”
Section: Monte-carlo-based Derivativesmentioning
confidence: 99%