Imitation learning with non-parametric regression

Vaandrager, Maarten; Babuška, Robert; Buşoniu, Lucian; Lopes, Gabriel

doi:10.1109/aqtr.2012.6237681

Cited by 4 publications

(5 citation statements)

References 17 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where σ s is a factor determining the length of the running average that was set to a value of 0.95, as in [12]. A low value for η s means that the sample and its closest neighbors live in the neighborhood of the same affine hyperplane.…”

Section: B Memory Storagementioning

confidence: 99%

“…The strategy implemented in [12] to reduce noise is to replace the nearest neighbors y s with the estimated values using the local modelĥ(x s ). The local model is a linear least squares solution of the nearest neighbors, which is the best average of the first order relation present in the neighbors.…”

Section: B Memory Storagementioning

confidence: 99%

“…The memory management algorithm implemented in [12] is based on discarding new samples (or replacing old similar samples with the new observation) that fit best with the current affine model generated by the LLR. In practice this means that the nonlinear parts are populated with many samples while the linear parts have very little.…”

Section: B Memory Storagementioning

confidence: 99%

See 2 more Smart Citations

Stiffness and damping scheduling for legged locomotion

Zhang

Lopes

Babuška³

2013

2013 IEEE International Conference on Robotics and Biomimetics (ROBIO)

View full text Add to dashboard Cite

Legged robots are intrinsically nonlinear hybrid dynamic systems due to the intermittent contact of the feet with the ground. For optimal performance, in the sense of maximizing speed or energy consumption, different motion control affects the stance from the swing leg during a stride. Designing such controllers, however, can be a daunting task when there is a lack of knowledge about the exact operating conditions, i.e., the surface on which the robot walks or runs. To address this problem, we present a model-free learning controller making use of a supervised machine learning method called Local Linear Regression. This method allows the controller to online adjust its controller parameters as a function of the state. We demonstrate this approach on a tunable stiffness and damping controller for a quadrupedal legged robot. The controller learns to compensate for friction and other nonlinear effects encountered while walking in an average sense, without the use of explicit models. Experimental results with the robot walking on a treadmill are presented.

show abstract

Section: B Memory Storagementioning

confidence: 99%

Section: B Memory Storagementioning

confidence: 99%

See 1 more Smart Citation

Stiffness and damping scheduling for legged locomotion

Zhang

Lopes

Babuška³

2013

2013 IEEE International Conference on Robotics and Biomimetics (ROBIO)

View full text Add to dashboard Cite

show abstract

“…Another method to improve the speed of learning is the use of human demonstrations to explore the state space of a system. Several attempts have been made to do so, 7,8 however, these attempts did use a model of the dynamics of the system at hand. A method that accelerates the reinforcement learning in a model-free and online fashion through the use of human demonstrations, has not yet been found.…”

Section: Introductionmentioning

confidence: 99%

“…Successful attempts have been made to include knowledge on the dynamics of the system to speed up the learning. 6,7 These attempts used different kinds of knowledge on the system dynamics to do so. Another method to improve the speed of learning is the use of human demonstrations to explore the state space of a system.…”

Section: Introductionmentioning

confidence: 99%

Human Demonstrations for Fast and Safe Exploration in Reinforcement Learning

Schonebaum

Junell

Kampen

2017

AIAA Information Systems-Aiaa Infotech @ Aerospace

View full text Add to dashboard Cite

Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller may initially be poor and -for real life applicationsunsafe. In this paper the effects of using human demonstrations on the performance of reinforcement learning is investigated, using a combination of offline and online least squares policy iteration. It is found that using the human as an efficient explorer improves learning time and performance for a benchmark reinforcement learning problem. The benefit of the human demonstration is larger for problems where the human can make use of its understanding of the problem to efficiently explore the state space. Applied to a simplified quadrotor slung load drop off problem, the use of human demonstrations reduces the number of crashes during learning. As such, this paper contributes to safer and faster learning for model-free, adaptive control problems.

show abstract