LQR-trees: Feedback motion planning on sparse randomized trees

Tedrake, Russell Louis

doi:10.15607/rss.2009.v.003

Cited by 137 publications

(146 citation statements)

References 16 publications

Supporting

Mentioning

146

Contrasting

Order By: Relevance

“…In other words, given controllers for flips, front aerials, and cartwheels, how can a character perform a flip, followed by a cartwheel, before finishing with a front aerial? One would need a method to augment and estimate the domain of attractions of our controllers, perhaps inspired by the work of Tedtrake et al [Ted09] and Faloutsos et al [FvdPT01].…”

Section: Discussionmentioning

confidence: 99%

Feedback control for rotational movements in feature space

Borno

Fiume

Hertzmann

et al. 2014

Computer Graphics Forum

View full text Add to dashboard Cite

Synthesizing controllers for rotational movements in feature space is an open research problem and is particularly challenging because of the need to precisely regulate the character's global orientation, angular momentum and inertia. This paper presents feature-based controllers for a wide variety of rotational movements, including cartwheels, dives and flips. We show that the controllers can be made robust to large external disturbances by using a time-invariant control scheme. The generality of the control laws is demonstrated by providing examples of the flip controller with different apexes, the diving controller with different heights and styles, the cartwheel controller with different speeds and straddle widths, etc. The controllers do not rely on any input motion or offline optimization.

show abstract

Section: Discussionmentioning

confidence: 99%

Feedback control for rotational movements in feature space

Borno

Fiume

Hertzmann

et al. 2014

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

“…In related work by Neumann et al (2009), an agent learns to solve a complex task by sequencing motion templates. The most recent related work is by Tedrake (2010) in a model-based control setting.…”

Section: Representing Complex Policies Using a Collection Ofmentioning

confidence: 99%

Behavioral Hierarchy: Exploration and Representation

Barto

Konidaris

Vigorito

2013

Computational and Robotic Models of the Hierarchical Organization of Behavior

View full text Add to dashboard Cite

Behavioral modules are units of behavior providing reusable building blocks that can be composed sequentially and hierarchically to generate extensive ranges of behavior. Hierarchies of behavioral modules facilitate learning complex skills and planning at multiple levels of abstraction and enable agents to incrementally improve their competence for facing new challenges that arise over extended periods of time. This chapter focusses on two features of behavioral hierarchy that appear to be less well recognized: its influence on exploratory behavior and the opportunity it affords to reduce the representational challenges of planning and learning in large, complex domains. Four computational examples are described that use methods of hierarchical reinforcement learning to illustrate the influence of behavioral hierarchy on exploration and representation. Beyond illustrating these features, the examples provide support for the central role of behavioral hierarchy in development and learning for both artificial and natural agents.

show abstract

“…used trajectories to provide estimates of values of a set of initial states [19]. A number of efforts have been made to use collections of trajectories to represent policies [3,20,6,7,21,22,23,24,25,26,27]. [21] created sets of locally optimized trajectories to handle changes to the system dynamics.…”

Section: Related Workmentioning

confidence: 99%

Trajectory-Based Dynamic Programming

Atkeson

Liu

2013

Cognitive Systems Monographs

View full text Add to dashboard Cite

This paper reviews a variety of ways to use trajectory optimization to accelerate dynamic programming. Dynamic programming provides a way to design globally optimal control laws for nonlinear systems. However, the curse of dimensionality, the exponential dependence of space and computation resources needed on the dimensionality of the state and control, limits the application of dynamic programming in practice. We explore trajectory-based dynamic programming, which combines many local optimizations to accelerate the global optimization of dynamic programming.What is Dynamic Programming? Dynamic programming provides a way to find globally optimal control laws (policies), u = u(x), which give the appropriate action u for any state x [1,2]. Dynamic programming takes as input a one step cost (a.k.a. "reward" or "loss") function and the dynamics of the problem to be optimized. This paper focuses on offline planning of nonlinear control laws for control problems with continuous states and actions, deterministic time invariant discrete time dynamics x k+1 = f(x k , u k ), and a time invariant one step cost function L(x, u), so we use discrete time dynamic programming. We are focusing on steady state policies and thus an infinite time horizon.One approach to dynamic programming is to approximate the value function V (x) (the optimal total future cost from each state , u))) at sampled states x j until the value function estimates have converged. Typically the value function and control law are represented on a regular grid. Some type of interpolation is used to approximate these functions within each grid cell. If each dimension of the state and action is represented with a resolution R, and the dimensionality of the state is d x and that of the action is d u , the computational cost of the conventional approach is proportional to R d x × R d u and the memory cost is proportional to R d x . This is known as the Curse of Dimensionality [1].An example problem: We use one link pendulum swingup as an example problem to provide the reader with a visualizable example of a nonlinear control law and corresponding value function. In one link pendulum swingup a motor at the base of the pendulum swings a rigid arm from the downward stable equilibrium to the upright unstable equilibrium and balances the arm there (Fig. 1). What makes this challenging is that a one step cost function penalizes the amount of torque used and the deviation of the current position from the goal. The controller must try to minimize the total cost of the trajectory. The one step cost function for this example is a weighted sum of the squared position errors (θ: difference between current angle and the goal angle) and the squared torques τ: L(x, u) = 0.1θ 2 + τ 2 where 0.1 weights the position error relative to the torque penalty. There are no costs associated with the joint velocity. The uniform density link has a mass of 1kg, length of 1m, and width of 0.1m. Because the dynamics and cost function are time invariant, there is a steady state control law and...

show abstract

LQR-trees: Feedback motion planning on sparse randomized trees

Cited by 137 publications

References 16 publications

Feedback control for rotational movements in feature space

Feedback control for rotational movements in feature space

Behavioral Hierarchy: Exploration and Representation

Trajectory-Based Dynamic Programming

Contact Info

Product

Resources

About