2017
DOI: 10.1007/s10994-017-5657-1
|View full text |Cite
|
Sign up to set email alerts
|

Generalized exploration in policy search

Abstract: To learn control policies in unknown environments, learning agents need to explore by trying actions deemed suboptimal. In prior work, such exploration is performed by either perturbing the actions at each time-step independently, or by perturbing policy parameters over an entire episode. Since both of these strategies have certain advantages, a more balanced trade-off could be beneficial. We introduce a unifying view on step-based and episode-based exploration that allows for such balanced trade-offs. This tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…This policy was optimized by a standard RL algorithm that did not account for the dependence of actions. In [11], a policy was analyzed whose parameters were incremented by the OU stochastic process. Essentially, this resulted in autocorrelated random components of actions.…”
Section: A Stochastic Dependence Between Actionsmentioning
confidence: 99%
“…This policy was optimized by a standard RL algorithm that did not account for the dependence of actions. In [11], a policy was analyzed whose parameters were incremented by the OU stochastic process. Essentially, this resulted in autocorrelated random components of actions.…”
Section: A Stochastic Dependence Between Actionsmentioning
confidence: 99%
“…In this experiment, the evolutionary strategies were given more frames, but still ran faster due to better parallelizability. Van Hoof et al (2017) apply auto-correlated noise in parameter space. This noise is distributed similarly to that proposed by Morimoto and Doya (2001) and , but applied to the parameters rather than the actions.…”
Section: Parameter-space Perturbing Strategiesmentioning
confidence: 99%
“…In [5] a policy was analyzed whose parameters were incremented by the autoregressive stochastic process. Essentially, this resulted in autocorrelated random components of actions.…”
Section: Stochastic Dependence Between Actionsmentioning
confidence: 99%
“…This section presents hyperparameters used in simulations reported in Sec. 5. All algorithms used the discount factor equal to 0.99.…”
Section: A Algorithms' Hyperparametersmentioning
confidence: 99%