2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning 2007
DOI: 10.1109/adprl.2007.368210
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Temporal Difference Learning Using LASSO

Abstract: We consider the problem of on-line value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approximators. We propose to fit the use of kernels into the temporal difference algorithms by using regression via the LASSO. We introduce the equi-gradient descent algorithm (EGD) which is a direct adaptation of the one recently introduced in the LARS algorithm family for solving the LASSO. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
29
0

Year Published

2008
2008
2017
2017

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 35 publications
(29 citation statements)
references
References 10 publications
0
29
0
Order By: Relevance
“…In this case, the user should determine the architecture of the network. If the user elects to use a nonparametric regularization-based method (e.g., Engel et al 2005;Jung and Polani 2006;Loth et al 2007;Farahmand et al 2009b;Taylor and Parr 2009;Kolter and Ng 2009), the regularization coefficient and kernel (or other) parameters should be selected. From a general viewpoint, the decision of which method (linear vs. non-linear, parametric vs. nonparametric) to use is not different from that of how to tune a particular method.…”
mentioning
confidence: 99%
“…In this case, the user should determine the architecture of the network. If the user elects to use a nonparametric regularization-based method (e.g., Engel et al 2005;Jung and Polani 2006;Loth et al 2007;Farahmand et al 2009b;Taylor and Parr 2009;Kolter and Ng 2009), the regularization coefficient and kernel (or other) parameters should be selected. From a general viewpoint, the decision of which method (linear vs. non-linear, parametric vs. nonparametric) to use is not different from that of how to tune a particular method.…”
mentioning
confidence: 99%
“…Using an ℓ 1 -penalty term for value function approximation has been considered before in [16] in an approximate linear programming context or in [13] where it is used to minimize a Bellman residual 6 . Our work is closer to LARS-TD [12], briefly presented in Section 2.2, and both approaches are compared next.…”
Section: Discussionmentioning
confidence: 99%
“…One searches for an approximation of the value function V (being a fixed-point of the Bellman operator T ) belonging to some (linear) hypothesis space H, onto which one projects any function using the related projection operator Π. LSTD providesV ∈ H, the fixed-point of the composed operator ΠT . The sole generalization of LSTD to ℓ 1 -regularization has been proposed in [12] ( [11] solves the same problem, [13] regularizes abiased-Bellman Residual and [16] considers linear programming). They add an ℓ 1 -penalty term to the projection operator and solve the consequent fixed-point problem, the corresponding algorithm being called LARS-TD in reference to the homotopy path algorithm LARS (Least Angle Regression) [6] which inspired it.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, none of the previous works that we know of explored in a systematic manner how regularization influences the performance of the resulting procedure. The only works that we know of that used regularization are that of Jung and Polani [11], Loth et al [14] and Xu et al [22]. In particular, Jung and Polani [11] explored penalizing the empirical L 2 -norm of the Bellman-residual for finding the value function of a policy given a trajectory in a deterministic system, while L 1 -penalties for the same problem were considered by Loth et al [14].…”
Section: Introductionmentioning
confidence: 99%