2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022
DOI: 10.1109/iros47612.2022.9981198
|View full text |Cite
|
Sign up to set email alerts
|

Advanced Skills by Learning Locomotion and Local Navigation End-to-End

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
26
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 39 publications
(27 citation statements)
references
References 25 publications
1
26
0
Order By: Relevance
“…By using a specialized policy, ANYmal crossed a 0.6-m-wide gap within a premapped environment (14). Most notably, our locomotion controller, not being specialized or fine-tuned for this terrain type, crossed a sequence of four gaps with the same width while relying on online generated maps only.…”
Section: Benchmark Against Rl Controlmentioning
confidence: 99%
See 2 more Smart Citations
“…By using a specialized policy, ANYmal crossed a 0.6-m-wide gap within a premapped environment (14). Most notably, our locomotion controller, not being specialized or fine-tuned for this terrain type, crossed a sequence of four gaps with the same width while relying on online generated maps only.…”
Section: Benchmark Against Rl Controlmentioning
confidence: 99%
“…The locomotion policy π(a | o) is a stochastic distribution of actions a ∈  that are conditioned on observations o ∈ , parametrized by an MLP. The action space comprises target joint positions that are tracked using a proportional-derivative (PD) controller, following the approach in (10) and related works (12)(13)(14).…”
Section: Overview Of the Training Environmentmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, the agent is rewarded at the end of the episode for standing up in a configuration close to ALMA's default stance pose. We define the fall and recovery problem as a finite-horizon MDP with time-based rewards similar to Rudin et al [20], where time-variant task rewards are used to train efficient and adaptive locomotion skills on diverse terrains. The rewards that regularize the robot's undesirable behaviors, such as joint acceleration penalty and high impact, are timeinvariant and active throughout the episode.…”
Section: Reward Functionmentioning
confidence: 99%
“…However, one disadvantage of model-free RL is that it typically involves an inefficient trial-and-error process, which leads to long training times before attaining satisfactory performance. So rather than relying on real-world experience, simulators are often used to generate realistic training data efficiently (34). When combined with strategies that mitigate the sim-to-real gap (35), this can enable reliable transfer to hardware.…”
Section: Introductionmentioning
confidence: 99%