IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society 2010
DOI: 10.1109/iecon.2010.5675206
|View full text |Cite
|
Sign up to set email alerts
|

Online support vector regression based actor-critic method

Abstract: This paper proposes a new algorithm for actorcritic method using online support vector regression(SVR), which can do incremental learning and automatically track variation of environment with time-varying characteristics. It gives good generalization properties to value function approximation and helps the critic converge fast. In addition, sample vectors in data set of the online SVR are used as center positions of actor's basis functions. Actor updates policy parameters with those functions using policy grad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 10 publications
0
1
0
Order By: Relevance
“…For many machine learning problems, the common distinction between a training and an application phase is not reasonable (e.g., [13,18]). They rather require the gradual extension of available knowledge when the respective learning technique is already in application.…”
Section: Introductionmentioning
confidence: 99%
“…For many machine learning problems, the common distinction between a training and an application phase is not reasonable (e.g., [13,18]). They rather require the gradual extension of available knowledge when the respective learning technique is already in application.…”
Section: Introductionmentioning
confidence: 99%
“…However no statistical guarantees about the convergence was presented. Similarly [51] apply incremental SVR to the approximation of action value function in an actor-critic basis without PI. However the paper lacks a rigorous statistical analysis of their method.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…Moreover, it is also a batch algorithm and cannot be directly applied to RL problem. [51] apply an incremental SVR to the approximation of action value function in an actor-critic basis without Policy Iteration (PI). However the paper suffer for a lack of rigorous statistical analysis of their method.…”
Section: Thesis Overviewmentioning
confidence: 99%