Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2019
DOI: 10.1145/3292500.3330933
|View full text |Cite
|
Sign up to set email alerts
|

Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation

Abstract: Reinforcement learning aims at searching the best policy model for decision making, and has been shown powerful for sequential recommendations. The training of the policy by reinforcement learning, however, is placed in an environment. In many real-world applications, however, the policy training in the real environment can cause an unbearable cost, due to the exploration in the environment. Environment reconstruction from the past data is thus an appealing way to release the power of reinforcement learning in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 53 publications
(42 citation statements)
references
References 12 publications
0
42
0
Order By: Relevance
“…However, the recommendation often fails when drivers make decisions in a complex environment. To address this issue, in [111] a new method is proposed to model hidden causal factors, called confounders, in a complex environment. Specifically, the framework in [50] is extended to include the confounders.…”
Section: Other Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the recommendation often fails when drivers make decisions in a complex environment. To address this issue, in [111] a new method is proposed to model hidden causal factors, called confounders, in a complex environment. Specifically, the framework in [50] is extended to include the confounders.…”
Section: Other Applicationsmentioning
confidence: 99%
“…This presents difficulty in evaluating the methods for different environments. For example, the confounders modeling hidden causal factors in [111] can also contribute to DRL modeling in E-commerce. This is because modeling customers' interests are always subject to changing environments and a new environment may contain hidden causal factors.…”
Section: Difficulty In Comparison With Different Applicationsmentioning
confidence: 99%
“…In the off-policy setting, Chen et al [5] and Zhao et al [45] proposed the use of propensity scores to perform off-policy correction, but with training difficulties due to high-variance. Model-based RL approaches [6,34,47] first build a model to simulate the environment, in order to avoid any issues with off-policy training. However, these two-stage approaches depend heavily on the accuracy of the simulator.…”
Section: Related Workmentioning
confidence: 99%
“…However, the estimation of propensity scores has high variance and there are tricks like smoothing or clipping to train the model (we discuss this in section 4). Model-based RL approaches [2,33,47] attempt to eliminate the off-policy issue by building a model to simulate the environment. The policy can then be trained through interactions with the simulator.…”
Section: Related Workmentioning
confidence: 99%