2020
DOI: 10.1007/978-3-030-46133-1_5
|View full text |Cite
|
Sign up to set email alerts
|

Practical Open-Loop Optimistic Planning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 9 publications
0
8
0
Order By: Relevance
“…A first category of algorithms rely on optimistic planning [22], and require additional assumptions: a deterministic MDP [15], the open loop setting [2,21] in which policies are sequences of actions instead of state-action mappings (the two are equivalent in MDPs with deterministic transitions), or an MDP with known parameters [3]. For MDPs with stochastic and unknown transitions, polynomial sample complexities have been obtained for StOP [27], TrailBlazer [13] and SmoothCruiser [14] but the three algorithms suffer from numerical inefficiency, even for B < ∞.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A first category of algorithms rely on optimistic planning [22], and require additional assumptions: a deterministic MDP [15], the open loop setting [2,21] in which policies are sequences of actions instead of state-action mappings (the two are equivalent in MDPs with deterministic transitions), or an MDP with known parameters [3]. For MDPs with stochastic and unknown transitions, polynomial sample complexities have been obtained for StOP [27], TrailBlazer [13] and SmoothCruiser [14] but the three algorithms suffer from numerical inefficiency, even for B < ∞.…”
Section: Related Workmentioning
confidence: 99%
“…We compare MDP-GapE to three existing baselines: first, the KL-OLOP algorithm [21], which uses the same upper-confidence bounds on the rewards u t h and states values U t h as MDP-GapE, but is restricted to open-loop policies, i.e. sequences of actions only.…”
Section: Fixed-budget Evaluationmentioning
confidence: 99%
“…The problem formalization for workings of vehicle kinematics, temporal abstraction, partial observability and reward hypothesis has been studied extensively as well [35]. Robust optimization planning has been studied is past for finite MDP systems with uncertain parameters [37]- [39]. And also has shown promising results under conservative driving behavior.…”
Section: Previous Workmentioning
confidence: 99%
“…Over the past decade, deep learning based approaches have been utilized with satisfactory results in various fields such as fair data generation [1], anomaly detection [2], accident detection [3], scene classification [4], hyperspectral image classification [5] and optimal path planning [6]. In comparison to typical path planning algorithms, reinforcement learning (RL) based approaches got significant attention in the recent past due to the success of the deep learning and computer vision.…”
Section: Introductionmentioning
confidence: 99%
“…Later was extended [27] for supporting stochastic rewards and dynamics in a open-loop setting, i.e, sequence of actions. The preceding algorithm was further improved in [6]. The modified version constraints on upper-confidence which helps to improve the performance considerably.…”
Section: Introductionmentioning
confidence: 99%