2019
DOI: 10.48550/arxiv.1906.06062
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…This includes sequence models, from which we can sample using Stochastic Beam Search (Kool et al, 2019c), as well as multivariate categorical distributions which can also be treated as sequence models (see Section 4.2). In the presence of continuous variables or a stochastic function f , we may separate this stochasticity from the stochasticity over the discrete distribution, as in Lorberbom et al (2019). The computation of the leave-one-out ratios adds some overhead, although they can be computed efficiently, even for large k (see Appendix B).…”
Section: The Unordered Set Policy Gradient Estimatormentioning
confidence: 99%
“…This includes sequence models, from which we can sample using Stochastic Beam Search (Kool et al, 2019c), as well as multivariate categorical distributions which can also be treated as sequence models (see Section 4.2). In the presence of continuous variables or a stochastic function f , we may separate this stochasticity from the stochasticity over the discrete distribution, as in Lorberbom et al (2019). The computation of the leave-one-out ratios adds some overhead, although they can be computed efficiently, even for large k (see Appendix B).…”
Section: The Unordered Set Policy Gradient Estimatormentioning
confidence: 99%