2018
DOI: 10.48550/arxiv.1803.07055
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simple random search provides a competitive approach to reinforcement learning

Abstract: A common belief in model-free reinforcement learning is that methods based on random search in the parameter space of policies exhibit significantly worse sample complexity than those that explore the space of actions. We dispel such beliefs by introducing a random search method for training static, linear policies for continuous control problems, matching state-ofthe-art sample efficiency on the benchmark MuJoCo locomotion tasks. Our method also finds a nearly optimal controller for a challenging instance of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
133
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 81 publications
(139 citation statements)
references
References 24 publications
0
133
0
Order By: Relevance
“…The ARS is a model-free reinforcement learning algorithm [39]. Based on randomized search in the parameter spaces of policies, the ARS executes the method of finite differences to control and adjust its weights and train the way how the policy performs its given tasks [47], [39]. Via the random search in the parameter spaces, the ARS algorithm conducts a derivativefree policy optimization with noises [47], [39].…”
Section: E Augmented Random Searchmentioning
confidence: 99%
See 2 more Smart Citations
“…The ARS is a model-free reinforcement learning algorithm [39]. Based on randomized search in the parameter spaces of policies, the ARS executes the method of finite differences to control and adjust its weights and train the way how the policy performs its given tasks [47], [39]. Via the random search in the parameter spaces, the ARS algorithm conducts a derivativefree policy optimization with noises [47], [39].…”
Section: E Augmented Random Searchmentioning
confidence: 99%
“…Based on randomized search in the parameter spaces of policies, the ARS executes the method of finite differences to control and adjust its weights and train the way how the policy performs its given tasks [47], [39]. Via the random search in the parameter spaces, the ARS algorithm conducts a derivativefree policy optimization with noises [47], [39]. For updating the training weights in an effective way, the ARS (i) uniformly selects update directions and (ii) updates the policies based on the selected direction.…”
Section: E Augmented Random Searchmentioning
confidence: 99%
See 1 more Smart Citation
“…Conventional deep RL methods rely on chain-rules and back-propagation to update the parameters of the neural networks, which makes it rather difficult to incorporate nondifferentiatable modules like TAM. Recently, evolutionary strategy (ES) has been proven to be an effective, scalable alternative to conventional RL methods [10], [11]. ES methods are derivative-free and can be easily parallelized.…”
Section: A Motivationmentioning
confidence: 99%
“…where a t and cr t are the action and the learned criterion based on the current state s t and the current latent variable c t . The TAM is generated by comparing the voltage magnitude at each bus i with the voltage criterion cr t , as shown by (11). The action is filtered by conducting a element-wise multiplication with TAM, as shown by (12).…”
Section: A Embedding Physics Knowledge Through Tammentioning
confidence: 99%