2019
DOI: 10.48550/arxiv.1905.13559
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Advantage Amplification in Slowly Evolving Latent-State Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Although we could not enjoy advantage in the simulator's efficiency, maintaining move still facilitates training and this is solely because maintaining move decision increases the influence of a single move decision, as we will confirm with experiments. In this sense, maintaining move rather can be viewed as 'amplifying advantage' from (Mladenov et al 2019).…”
Section: Maintaining Move Actionmentioning
confidence: 99%
“…Although we could not enjoy advantage in the simulator's efficiency, maintaining move still facilitates training and this is solely because maintaining move decision increases the influence of a single move decision, as we will confirm with experiments. In this sense, maintaining move rather can be viewed as 'amplifying advantage' from (Mladenov et al 2019).…”
Section: Maintaining Move Actionmentioning
confidence: 99%
“…The underlying MDP One could cast the recommendation problem as a POMDP (Lu & Yang, 2016;Mladenov et al, 2019) in which the state of the environment is hidden and contains the user's internal state, which evolves over time. Equivalently, one can consider the belief-MDP induced by the recommender POMDP (Kaelbling et al, 1998), and approximate a solution to such belief-MDP via Deep-RL with a policy trained with observation histories as input (this is theoretically sufficient for the policy to recover a belief over the current hidden state and take the optimal action).…”
Section: F Computing Metricsmentioning
confidence: 99%