Continuous Online Learning and New Insights to Online Imitation Learning

Cheng, Ching-An; Goldberg, Ken; Boots, Byron

doi:10.48550/arxiv.1912.01261

Cited by 2 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…, a N , s N }, we aim to learn a policy π * that best matches the demonstration trajectories. Note that we focus on the offline imitation setting as employed in [3], [7], where a set of demonstrations are provided ahead of time instead of gradually incrementing our dataset as in online imitation learning [33].…”

Section: Problem Statementmentioning

confidence: 99%

Learning From Imperfect Demonstrations From Agents With Varying Dynamics

Cao

Sadigh

2021

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Imitation learning enables robots to learn from demonstrations. Previous imitation learning algorithms usually assume access to optimal expert demonstrations. However, in many real-world applications, this assumption is limiting. Most collected demonstrations are not optimal or are produced by an agent with slightly different dynamics. We therefore address the problem of imitation learning when the demonstrations can be sub-optimal or be drawn from agents with varying dynamics. We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning. The proposed score enables learning from more informative demonstrations, and disregarding the less relevant demonstrations. Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.

show abstract

Section: Problem Statementmentioning

confidence: 99%

Learning From Imperfect Demonstrations From Agents With Varying Dynamics

Cao

Sadigh

2021

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

show abstract

“…The interactions of these properties make the classic adversary-style online learning analysis taken by Ross et al [2] overly conservative, creating a mismatch between provable theoretical guarantees and the learning phenomena observed in practice. This reality gap has motivated researchers to study deeper the theoretical underpinnings of OPO [12][13][14].…”

Section: Introductionmentioning

confidence: 99%

Explaining Fast Improvement in Online Imitation Learning

Yan¹,

Boots²,

Cheng³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Online policy optimization (OPO) views policy optimization for sequential decision making as an online learning problem. In this framework, the algorithm designer defines a sequence of online loss functions such that the regret rate in online learning implies the policy convergence rate and the minimal loss witnessed by the policy class determines the policy performance bias. This reduction technique has been successfully applied to solving various policy optimization problems, including imitation learning, structured prediction, and system identification. Interestingly, the policy improvement speed observed in practice is usually much faster than existing theory suggests. In this work, we provide an explanation of this fast policy improvement phenomenon. Let denote the policy class bias and assume the online loss functions are convex, smooth, and non-negative. We prove that, after N rounds of OPO with stochastic feedback, the policy converges in Õ(1/N + /N ) in both expectation and high probability. In other words, we show that adopting a sufficiently expressive policy class in OPO has two benefits: both the convergence rate increases and the performance bias decreases, as the policy class becomes reasonably rich. This new theoretical insight is further verified in an online imitation learning experiment.Preprint. Under review.

show abstract

Continuous Online Learning and New Insights to Online Imitation Learning

Cited by 2 publications

References 12 publications

Learning From Imperfect Demonstrations From Agents With Varying Dynamics

Learning From Imperfect Demonstrations From Agents With Varying Dynamics

Explaining Fast Improvement in Online Imitation Learning

Contact Info

Product

Resources

About