2018
DOI: 10.48550/arxiv.1804.03334
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

Abstract: In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…The normalization of NOSID seems to make the method more robust to initial α values that are too high, as well as maintaining performance across a wider range of µ values. On the other hand, it does not shift the use- ful range of µ values up as much 6 , again potentially indicating more variation in optimal µ value across problems. Additionally, the normalization procedure of NOSID makes use of a hard maximum rather than a running maximum in normalizing the β update, which is not well suited for non-stationary state representations where the appropriate normalization may significantly change over time.…”
Section: Mountain Carmentioning
confidence: 96%
See 1 more Smart Citation
“…The normalization of NOSID seems to make the method more robust to initial α values that are too high, as well as maintaining performance across a wider range of µ values. On the other hand, it does not shift the use- ful range of µ values up as much 6 , again potentially indicating more variation in optimal µ value across problems. Additionally, the normalization procedure of NOSID makes use of a hard maximum rather than a running maximum in normalizing the β update, which is not well suited for non-stationary state representations where the appropriate normalization may significantly change over time.…”
Section: Mountain Carmentioning
confidence: 96%
“…In addition, we extend the approach to AC (λ) and to vector-valued stepsizes as well as a "mixed" version which utilizes a combination of scalar and vector step-sizes. Also related are TIDBD and it's extension AutoTIDBD [5,6], to our knowledge the only prior work to investigate learning of vector step-sizes for RL. The authors focuses on TD(λ) for prediction, and explore both vector and scalar step-sizes.…”
Section: Related Workmentioning
confidence: 99%
“…We applied AdaGain, AMSGrad, RMSprop, SMD, and TIDBD (Kearney et al 2018)-a recent extension of the IDBD algorithm -to adapt the step-sizes of linear TD(λ) on Baird's counterexample. As before, the meta-parameters were extensively swept and the best performing parameters were used to generate the results for comparison.…”
Section: Experiments In Synthetic Tasksmentioning
confidence: 99%
“…Meta-descent applied to the step-size was first introduced for online least-mean squares methods (Jacobs 1988;Sutton 1992b;1992a;Almeida et al 1998;Mahmood et al 2012), including the linear complexity method IDBD (Sutton 1992b). IDBD was later extended to more general losses (Schraudolph 1999) and to support (semi-gradient) temporal difference methods (Dabney and Barto 2012;Dabney 2014;Kearney et al 2018). These methods are well-suited to non-stationary problems, and have been shown to ignore irrelevant features.…”
Section: Introductionmentioning
confidence: 99%
“…Step-size schedules leading to optimal convergence rates and robust guarantees are well-known in the stochastic optimization literature (Nemirovski et al, 2009;Ghadimi & Lan, 2013), but they often depend on problem-dependent quantities that are unavailable to the practitioner. Meta-descent methods such as Stochatic Metadescent (Schraudolph, 1999) and TIDBD (Kearney et al, 2018) instead learn a step-size according to some metaobjective, thereby obviating the need to tune one. Yet these methods inevitably introduce new parameters to be tuned, such as a meta-stepsize, decay parameter parameter, etc, and can be quite sensitive to these new parameters in practice (Jacobsen et al, 2019).…”
Section: Introductionmentioning
confidence: 99%