2019
DOI: 10.1016/j.neunet.2019.01.011
|View full text |Cite
|
Sign up to set email alerts
|

Policy search in continuous action domains: An overview

Abstract: Continuous action policy search is currently the focus of intensive research, driven both by the recent success of deep reinforcement learning algorithms and the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, providing a unified perspective on very different approaches, including also Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 61 publications
(47 citation statements)
references
References 98 publications
(142 reference statements)
0
47
0
Order By: Relevance
“…Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve tasks with long-term dependency or multi-level interaction patterns [5,6]. Recent works have suggested that several interesting and standout results can be induced by training multi-level hierarchical policy in a multi-task setup [8,25] or implementing hierarchical setting in sparse reward problems [23,34].…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve tasks with long-term dependency or multi-level interaction patterns [5,6]. Recent works have suggested that several interesting and standout results can be induced by training multi-level hierarchical policy in a multi-task setup [8,25] or implementing hierarchical setting in sparse reward problems [23,34].…”
Section: Related Workmentioning
confidence: 99%
“…The state of this environment is continuous and defined by the position (x, y) of the particle, and the control actions are its velocities (ẋ,ẏ), then D S = D A = 2. The initial position of the particle is sampled from a spherical Gaussian distribution centered in the position (4,4). This task can be decomposed into two composable tasks, namely, reaching the position −2 in the x coordinate, and reaching the position −2 in the y coordinate.…”
Section: A Tasks Descriptionmentioning
confidence: 99%
“…Several algorithms have been proposed to improve sample efficiency of model-free deep RL by making a better use of the sample information (data-efficiency), obtaining more information from data (sample choice) and improving several times the policy with the same samples (sample reuse) [4]. Fig.…”
Section: Introductionmentioning
confidence: 99%
“…The way standard approaches to RL work is through a combination of hill-climbing (gradient descent) and random exploration. For example, state-of-the-art deep reinforcement learning algorithms for learning continuous control, such as DDPG and related algorithms [Lillicrap et al, 2015, Schulman et al, 2017, Sigaud and Stulp, 2018, work by alternating between updating the current controller solution in order to climb the hill of rewards (this requires that rewards of different magnitudes are observed when slightly changing the controller), and producing random perturbations of the current best controller to obtain further information about the Figure 1. Curiosity-driven exploration through autonomous goal setting and self-organized curriculum learning in the experimental setup presented in [Forestier et al, 2017] (see video: https://www.youtube.com/watch?v=NOLAwD4ZTW0).…”
Section: Curiosity For Exploration and Discovery In An Open Worldmentioning
confidence: 99%
“…before finding the first few action sequences that produce ball movement. This problem of RL approaches that focus on hill-climing of the extrinsic reward is now well-known, and applies to man environments with rare or deceptive rewards 1 [Bellemare et al, 2016, Sigaud and Stulp, 2018, Colas et al, 2018].…”
Section: Curiosity For Exploration and Discovery In An Open Worldmentioning
confidence: 99%