2018
DOI: 10.1007/978-3-319-89656-4_6
|View full text |Cite
|
Sign up to set email alerts
|

Advice-Based Exploration in Model-Based Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(16 citation statements)
references
References 10 publications
0
16
0
Order By: Relevance
“…Figure 11(c) shows the average and the maximum number of steps required to terminate for all the engines with every specification across 100 executions in logarithmic scale. The number of steps is a known measure used to compare RL methods logically constrained with LTL formulae [40‐43]. Known RL‐LTL methods take a high number of steps, in the order of hundreds of thousands, because these methods aim to converge to an optimal policy.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Figure 11(c) shows the average and the maximum number of steps required to terminate for all the engines with every specification across 100 executions in logarithmic scale. The number of steps is a known measure used to compare RL methods logically constrained with LTL formulae [40‐43]. Known RL‐LTL methods take a high number of steps, in the order of hundreds of thousands, because these methods aim to converge to an optimal policy.…”
Section: Discussionmentioning
confidence: 99%
“…Several studies [40‐43] use LTL specifications as a high‐level guide for an RL agent. The RL agent in these studies never terminate and has to avoid violating a given specification indefinitely.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…where ρ ∈ AP is an atomic predicate; ¬ (negation), ∧ (conjunction), ∨ (disjunction) are Boolean connectives; ♦ (eventually) and (always) are temporal operators; and I is a bounded interval of the form I = [i 1 , i 2 ] (i 1 < i 2 , i 1 , i 2 ∈ T). For example, the MITL f formula [2,5] (x > 3) reads as "x is always greater than 3 during the time interval [2,5]". A timed word generated by a trajectory s 0:L is defined as a sequence (L(s t1 ), t 1 ), .…”
Section: Metric Interval Temporal Logicmentioning
confidence: 99%
“…The sampling efficiency and performance of RL can be improved if some high-level knowledge can be incorporated in the learning process [2]. Such knowledge can be also transferred from a source task to a target task if these tasks are logically similar [3].…”
Section: Introductionmentioning
confidence: 99%