2017
DOI: 10.48550/arxiv.1705.04185
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A First Empirical Study of Emphatic Temporal Difference Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 0 publications
1
4
0
Order By: Relevance
“…This experiment and the results on the Bertsekas's counterexample, together with several other results in [Ghiassian et al, 2017] and [Ghiassian et al, 2018], showed a consistent pattern of favoring Emphatic TD(λ) over conventional TD(λ) in the on-policy case. We did not strive to find a counterexample to challenge it.…”
Section: Discussionsupporting
confidence: 76%
See 3 more Smart Citations
“…This experiment and the results on the Bertsekas's counterexample, together with several other results in [Ghiassian et al, 2017] and [Ghiassian et al, 2018], showed a consistent pattern of favoring Emphatic TD(λ) over conventional TD(λ) in the on-policy case. We did not strive to find a counterexample to challenge it.…”
Section: Discussionsupporting
confidence: 76%
“…A natural question to ask at this point is if on-policy ETD outperforms on-policy TD in all cases. We have seen evidences, the successful convergence of ETD on the Baird's counterexample ([Sutton and Barto, 2018]) as well as several other experiments did in ([Ghiassian et al, 2017]), that suggest ETD is a better algorithm than TD in the off-policy case. We do not know, however, whether a similar pattern carries through to the on-policy case.…”
Section: Possible Advantages Of Learning With Emphasismentioning
confidence: 57%
See 2 more Smart Citations
“…The idea of adding the penalty in the form of a regularization term has been considered in [11] where Regularized off-policy TD (RO-TD) algorithm has been proposed based on GTD algorithms and convex-concave saddle point formulations. Emphatic TD algorithms (ETD) [12]- [15] are another popular class of off-policy TD algorithms that achieve stability by emphasizing or de-emphasizing updates of the algorithm. These updates also have linear-time complexity.…”
Section: Introductionmentioning
confidence: 99%