2016
DOI: 10.1109/tsmc.2015.2475716
|View full text |Cite
|
Sign up to set email alerts
|

Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms

Abstract: Abstract-In this paper we introduce a novel method for discovery of value functions for Markov Decision Processes (MDPs). This method, which we call Value Function Discovery (VFD), is based on ideas from the Evolutionary Algorithm field. VFD's key feature is that it discovers descriptions of value functions that are algebraic in nature. This feature is unique, because the descriptions include the model parameters of the MDP. The algebraic expression of the value function discovered by VFD can be used in severa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 14 publications
0
10
0
Order By: Relevance
“…To our VOLUME xxx, 2021 best knowledge, there have been no reports in the literature on the use of symbolic regression for constructing value functions. The closest related research is the use of genetic programming for fitting already available value functions (Vfunctions) [18], [19], which, however, is completely different from our approach. In [18], authors use GP to find an algebraic expression that fits the sample points of the optimal value function, obtained via value iteration.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…To our VOLUME xxx, 2021 best knowledge, there have been no reports in the literature on the use of symbolic regression for constructing value functions. The closest related research is the use of genetic programming for fitting already available value functions (Vfunctions) [18], [19], which, however, is completely different from our approach. In [18], authors use GP to find an algebraic expression that fits the sample points of the optimal value function, obtained via value iteration.…”
Section: Introductionmentioning
confidence: 99%
“…The closest related research is the use of genetic programming for fitting already available value functions (Vfunctions) [18], [19], which, however, is completely different from our approach. In [18], authors use GP to find an algebraic expression that fits the sample points of the optimal value function, obtained via value iteration. Contrary to [18], in [19] they use the fact that the so called threshold policy for the solved MDP is known a priori and they use GP to find a description of this threshold policy in terms of the MDP parameters.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, the value function approximation by GP becomes a hot spot in the evolutionary reinforcement learning domain. Reference [25] introduced a method to get a near-optimal value function on an illustrative Markov decision process (MDP) environment. Lately, [18] proposed an approach to calculate the symbolic value function in a policy iteration way.…”
Section: Related Workmentioning
confidence: 99%
“…For instance, in [6] a method called Value Function Discovery is proposed that uses GP to evolve algebraic description of the V-function. In [7] an evolutionary algorithm is used to accelerate the convergence of Q-tables.…”
Section: Introductionmentioning
confidence: 99%