2021
DOI: 10.1109/lra.2021.3101544
|View full text |Cite
|
Sign up to set email alerts
|

Modular Deep Reinforcement Learning for Continuous Motion Planning With Temporal Logic

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 60 publications
(17 citation statements)
references
References 21 publications
0
17
0
Order By: Relevance
“…Baseline Approaches: From the learning perspective, we refer to our distributed framework as "RRT*" or "D-RRT*", and compare it against three baselines: (i) The relaxed TLbased multi-objective rewards in [11], [31] referred to as "TL", for the single LTL task; (ii) For the goal-reaching task φ, the baseline referred to as "NED" designs the reward based on the negative Euclidean distance between the robot and destination; (iii) For a complex LTL task, instead of decomposition, this baseline directly applies the reward scheme (6) for the global trajectory τ * F = τ * pre [τ * suf ] ω referred to as "G-RRT*". From the perspective of infeasible LTL tasks, we compare with the work [30] of visiting as many accepting sets of LDGBA [33] as possible, and we empirically show the improvement over our prior work [13] that assumes the feasible cases.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Baseline Approaches: From the learning perspective, we refer to our distributed framework as "RRT*" or "D-RRT*", and compare it against three baselines: (i) The relaxed TLbased multi-objective rewards in [11], [31] referred to as "TL", for the single LTL task; (ii) For the goal-reaching task φ, the baseline referred to as "NED" designs the reward based on the negative Euclidean distance between the robot and destination; (iii) For a complex LTL task, instead of decomposition, this baseline directly applies the reward scheme (6) for the global trajectory τ * F = τ * pre [τ * suf ] ω referred to as "G-RRT*". From the perspective of infeasible LTL tasks, we compare with the work [30] of visiting as many accepting sets of LDGBA [33] as possible, and we empirically show the improvement over our prior work [13] that assumes the feasible cases.…”
Section: Resultsmentioning
confidence: 99%
“…The next operator is not always meaningful since it may require an immediate execution switch in the synthesized plans space [42]. Based on the application, we can either exclude the next operator as is common in related work [38], [42] or properly design practical LTL tasks [11] in continuous scenarios.…”
Section: Preliminariesmentioning
confidence: 99%
See 2 more Smart Citations
“…The known cart-pole experiment (Fig. 3.b) [8,25,44] has a task that is expressed by the LTL specification ♦y ∧ ♦g ∧ ¬u, namely, starting the pole in upright position, the goal is to prevent it from falling over ( ¬u, namely always not u) by moving the cart, whilst in particular alternating between the yellow (y) and green (g) regions ( ♦y ∧ ♦g), while avoiding the red (unsafe) parts of the track ( ¬u).…”
Section: Experimental Evaluationmentioning
confidence: 99%