Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164)
DOI: 10.1109/robot.2001.932842
|View full text |Cite
|
Sign up to set email alerts
|

Autonomous helicopter control using reinforcement learning policy search methods

Abstract: Abstract| Many control problems in the robotics eld can be cast as Partially Observed Markovian Decision Problems POMDPs, an optimal control formalism. Finding optimal solutions to such problems in general, however is known to be intractable. It has often been observed that in practice, simple structured controllers su ce for good sub-optimal control, and recent research in the arti cial intelligence community has focused on policy search methods as techniques for nding sub-optimal controllers when such struct… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
185
0

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 191 publications
(185 citation statements)
references
References 1 publication
0
185
0
Order By: Relevance
“…In fact, many of the methods that scale to the most interesting tasks are model-based and often employ policy search rather than value function-based approaches , Miyamoto et al, 1996, Bagnell and Schneider, 2001, Kohl and Stone, 2004, Tedrake et al, 2005, Peters and Schaal, 2008b,c, Kober and Peters, 2008. This stands in contrast to perhaps the bulk of [Kaelbling et al, 1996, Sutton andBarto, 1998] research in the machine learning community.…”
Section: (D)mentioning
confidence: 99%
See 2 more Smart Citations
“…In fact, many of the methods that scale to the most interesting tasks are model-based and often employ policy search rather than value function-based approaches , Miyamoto et al, 1996, Bagnell and Schneider, 2001, Kohl and Stone, 2004, Tedrake et al, 2005, Peters and Schaal, 2008b,c, Kober and Peters, 2008. This stands in contrast to perhaps the bulk of [Kaelbling et al, 1996, Sutton andBarto, 1998] research in the machine learning community.…”
Section: (D)mentioning
confidence: 99%
“…The computation of the policy update is the key step here and a variety of updates have been proposed ranging from pairwise comparisons [Strens andMoore, 2001, Ng et al, 2004a] over gradient estimation using finite policy differences [Geng et al, 2006, Mitsunaga et al, 2005, Sato et al, 2002, Tedrake et al, 2005, and general stochastic optimization methods (such as Nelder-Mead [Bagnell and Schneider, 2001], cross entropy [Rubinstein and Kroese, 2004] and population-based methods [Goldberg, 1989]) to approaches coming from optimal control such as differential dynamic programming (DDP) [Atkeson, 1998] and multiple shooting approaches [Betts, 2001] as well as core reinforcement learning methods.…”
Section: Policy Searchmentioning
confidence: 99%
See 1 more Smart Citation
“…Examples range from basic upright hovering and forward flight [4,10,15,16,17] to inverted hovering [14], and even to extreme aerobatic maneuvers [1,6,5].…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the model we present in this paper explicitly incorporates a model for the rotor speed dynamics, a crucial aspect of helicopter flight during autorotation. 4 Then, since it can be very difficult to specify helicopter maneuvers by hand, we use the expert demonstrations to define the autorotation task. (See also, e.g., [5], where demonstrations were used to enable a helicopter to fly high performance helicopter aerobatics.)…”
Section: Introductionmentioning
confidence: 99%