2012
DOI: 10.7763/ijmlc.2012.v2.201
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning with Kernel Recursive Least-Squares Support Vector Machine

Abstract: Abstract-A reinforcement learning system based on the kernel recursive least-squares algorithm for continuous state-space is proposed in this paper. A kernel recursive least-squares-support vector machine is used to realized a mapping from state-action pair to Q-value function. An online sparsification process that permits the addition of training sample into the Q-function approximation only if it is approximately linearly independent of the preceding training samples. Simulation result of two-link robot mani… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Remark 2: Note that although the formulas (17) and (19) have the same solution for optimal control problems, the formula (17) does not contain the knowledge of the internal dynamics ) (x f , which needs to be known explicitly in (19).…”
Section: Policy Iteration Algorithm For Solving the Hjb Equationmentioning
confidence: 99%
See 1 more Smart Citation
“…Remark 2: Note that although the formulas (17) and (19) have the same solution for optimal control problems, the formula (17) does not contain the knowledge of the internal dynamics ) (x f , which needs to be known explicitly in (19).…”
Section: Policy Iteration Algorithm For Solving the Hjb Equationmentioning
confidence: 99%
“…The learning result is relative to an initial value and it is difficult to converge to a unique optimal policy [18]. In addition, as a typical non-parametric kernel method, support vector machine (SVM), which is based on Vapnik's structural risk minimization (SRM) [19], has perfect generalization property and can be able to overcome the existing weakness in parametric function approximator. But it will decrease the interpretative capability of the model, when the large amount of information is provided by experience.…”
Section: Introductionmentioning
confidence: 99%
“…Besides, the optimum solutions are obtained by solving the standard quadratic programming (QP) problem, which limits its real-time applications [29]. Based on standard SVM, the LS-SVM was put forward and has been applied extensively since [30][31][32]. The difference between the LS-SVM and the standard SVM is that the constraint condition is equalities rather than inequalities, thus converting the QP problem into a linear equation groups problem.…”
Section: Introductionmentioning
confidence: 99%