Simple random search provides a competitive approach to reinforcement learning

Mania, Horia; Guy, Aurelia; Recht, Benjamin

doi:10.48550/arxiv.1803.07055

Cited by 81 publications

(139 citation statements)

References 24 publications

Supporting

Mentioning

133

Contrasting

Order By: Relevance

“…The ARS is a model-free reinforcement learning algorithm [39]. Based on randomized search in the parameter spaces of policies, the ARS executes the method of finite differences to control and adjust its weights and train the way how the policy performs its given tasks [47], [39]. Via the random search in the parameter spaces, the ARS algorithm conducts a derivativefree policy optimization with noises [47], [39].…”

Section: E Augmented Random Searchmentioning

confidence: 99%

“…Based on randomized search in the parameter spaces of policies, the ARS executes the method of finite differences to control and adjust its weights and train the way how the policy performs its given tasks [47], [39]. Via the random search in the parameter spaces, the ARS algorithm conducts a derivativefree policy optimization with noises [47], [39]. For updating the training weights in an effective way, the ARS (i) uniformly selects update directions and (ii) updates the policies based on the selected direction.…”

Section: E Augmented Random Searchmentioning

confidence: 99%

“…2N rollouts and rewards are collected by N noisy policies π t,k,± = θ t ± νδ k (line 10-12). The state normalization is used in RAIL (line 6, 19); and it makes policy π t,i,± have equal influence for the changes of state components when there are state components with various ranges [39], [38], [51]. Notice that the reason why state normalization is required is that the high dimensional problems have multiple state components with various ranges; and therefore it makes the policies to result in large changes in actions when the equivalent sized changes are not equally influence states.…”

Section: Reward Functionmentioning

confidence: 99%

See 2 more Smart Citations

Parallelized and Randomized Adversarial Imitation Learning for Safety-Critical Self-Driving Vehicles

Yun¹,

Shin²,

Jung³

et al. 2021

Preprint

View full text Add to dashboard Cite

Self-driving cars and autonomous driving research has been receiving considerable attention as major promising prospects in modern artificial intelligence applications. According to the evolution of advanced driver assistance system (ADAS), the design of self-driving vehicle and autonomous driving systems becomes complicated and safety-critical. In general, the intelligent system simultaneously and efficiently activates ADAS functions. Therefore, it is essential to consider reliable ADAS function coordination to control the driving system, safely. In order to deal with this issue, this paper proposes a randomized adversarial imitation learning (RAIL) algorithm. The RAIL is a novel derivative-free imitation learning method for autonomous driving with various ADAS functions coordination; and thus it imitates the operation of decision maker that controls autonomous driving with various ADAS functions. The proposed method is able to train the decision maker that deals with the LIDAR data and controls the autonomous driving in multi-lane complex highway environments. The simulation-based evaluation verifies that the proposed method achieves desired performance.

show abstract

Section: E Augmented Random Searchmentioning

confidence: 99%

Section: E Augmented Random Searchmentioning

confidence: 99%

Section: Reward Functionmentioning

confidence: 99%

See 1 more Smart Citation

Parallelized and Randomized Adversarial Imitation Learning for Safety-Critical Self-Driving Vehicles

Yun¹,

Shin²,

Jung³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Conventional deep RL methods rely on chain-rules and back-propagation to update the parameters of the neural networks, which makes it rather difficult to incorporate nondifferentiatable modules like TAM. Recently, evolutionary strategy (ES) has been proven to be an effective, scalable alternative to conventional RL methods [10], [11]. ES methods are derivative-free and can be easily parallelized.…”

Section: A Motivationmentioning

confidence: 99%

“…where a t and cr t are the action and the learned criterion based on the current state s t and the current latent variable c t . The TAM is generated by comparing the voltage magnitude at each bus i with the voltage criterion cr t , as shown by (11). The action is filtered by conducting a element-wise multiplication with TAM, as shown by (12).…”

Section: A Embedding Physics Knowledge Through Tammentioning

confidence: 99%

Physics-informed Evolutionary Strategy based Control for Mitigating Delayed Voltage Recovery

Du¹,

Huang²,

Huang³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this work we propose a novel data-driven, realtime power system voltage control method based on the physicsinformed guided meta evolutionary strategy (ES). The main objective is to quickly provide an adaptive control strategy to mitigate the fault-induced delayed voltage recovery (FIDVR) problem. Reinforcement learning methods have been developed for the same or similar challenging control problems, but they suffer from training inefficiency and lack of robustness for "corner or unseen" scenarios. On the other hand, extensive physical knowledge has been developed in power systems but little has been leveraged in learning-based approaches. To address these challenges, we introduce the trainable action mask technique for flexibly embedding physical knowledge into RL models to rule out unnecessary or unfavorable actions, and achieve notable improvements in sample efficiency, control performance and robustness. Furthermore, our method leverages past learning experience to derive surrogate gradient to guide and accelerate the exploration process in training. Case studies on the IEEE 300-bus system and comparisons with other state-of-the-art benchmark methods demonstrate effectiveness and advantages of our method.

show abstract

Multi-objective Genetic Programming for Explainable Reinforcement Learning

Mathurin

Leite²,

Teytaud³

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Deep reinforcement learning has met noticeable successes recently for a wide range of control problems. However, this is typically based on thousands of weights and non-linearities, making solutions complex, not easily reproducible, uninterpretable and heavy. The present paper presents genetic programming approaches for building symbolic controllers. Results are competitive, in particular in the case of delayed rewards, and the solutions are lighter by orders of magnitude and much more understandable.

show abstract

Simple random search provides a competitive approach to reinforcement learning

Cited by 81 publications

References 24 publications

Parallelized and Randomized Adversarial Imitation Learning for Safety-Critical Self-Driving Vehicles

Parallelized and Randomized Adversarial Imitation Learning for Safety-Critical Self-Driving Vehicles

Physics-informed Evolutionary Strategy based Control for Mitigating Delayed Voltage Recovery

Multi-objective Genetic Programming for Explainable Reinforcement Learning

Contact Info

Product

Resources

About