2019
DOI: 10.48550/arxiv.1902.00183
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Action Representations for Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…Recent works have aimed to express the similarities between actions to learn policies more quickly, especially over large action spaces. For example, one approach is to learn action embeddings, which could then be used to learn a policy (Chandak et al, 2019;Chen et al, 2019). Another approach is to directly learn about irrelevant actions and then eliminate them from being selected (Zahavy et al, 2018).…”
Section: Action Reductionmentioning
confidence: 99%
“…Recent works have aimed to express the similarities between actions to learn policies more quickly, especially over large action spaces. For example, one approach is to learn action embeddings, which could then be used to learn a policy (Chandak et al, 2019;Chen et al, 2019). Another approach is to directly learn about irrelevant actions and then eliminate them from being selected (Zahavy et al, 2018).…”
Section: Action Reductionmentioning
confidence: 99%
“…The method proposed in [22] offers a simple yet effective way to obtain a sparse DNN representation of the training data to assist the DRL agent in better understanding useful and pertinent dynamics in RL tasks. On the other hand, the works in [23,24] investigate the embeddings of action space from theoretical and practical perspectives. Moreover, the Value Prediction Network (VPN) [25] avoids the challenging task of modeling the full environment by only focusing on predicting value/reward of future states.…”
Section: Related Workmentioning
confidence: 99%
“…However, the vanilla policy gradient algorithm performs poorly with large discrete action spaces. To address this, we can instead decompose a policy into a component that acts in a latent space of action representations(embeddings) and a component that transforms these representations into actual actions, as shown in [5]. This allows generalization over actions, as similar actions have similar action representations, and improves performance, while speeding up learning.…”
Section: Computing Allocation Strategymentioning
confidence: 99%