2018
DOI: 10.1016/j.engappai.2017.12.004
|View full text |Cite
|
Sign up to set email alerts
|

Policy derivation methods for critic-only reinforcement learning in continuous spaces

Abstract: This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuousvalued action is used, the most common approach is t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 24 publications
0
13
0
Order By: Relevance
“…We use data measured on a real system and an empirical modelf (x) = −ic 1 x/(x 2 + c 2 ) 3 proposed in the literature [15] as the reference model. Parameters c 1 and c 2 were found empirically for the given system and this model was used to design well-performing nonlinear controllers in [1,8]. For this example, we define y = f (x).…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…We use data measured on a real system and an empirical modelf (x) = −ic 1 x/(x 2 + c 2 ) 3 proposed in the literature [15] as the reference model. Parameters c 1 and c 2 were found empirically for the given system and this model was used to design well-performing nonlinear controllers in [1,8]. For this example, we define y = f (x).…”
Section: Methodsmentioning
confidence: 99%
“…We chose for this variant of GP since it has recently been shown in [3,4,18,24] that GP methods evolving this kind of compound regression models outperform conventional GP evolving a singletree structure representing the whole model. In particular, the base SNGP has been successfully used for several SR tasks from the reinforcement learning and robotics domains [1,2,12,13]. A detailed description of the base SNGP is beyond the scope of this paper.…”
Section: Base Sngpmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition to leverage genetic programming to generate interpretable policies, the influence of the smooth property of symbolic approximation function in reinforcement learning has also been studied. References [1] and [2] proposed a symbolic regression method to obtain a smooth v-function from the original value function, and experiment results reveal that policy derived from the smooth value function performs much better than the policy derived from the original value function.…”
Section: Related Workmentioning
confidence: 99%