Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms

Onderwater, Martijn; Bhulai, Sandjai; Mei, Rob van der

doi:10.1109/tsmc.2015.2475716

Cited by 10 publications

(10 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To our VOLUME xxx, 2021 best knowledge, there have been no reports in the literature on the use of symbolic regression for constructing value functions. The closest related research is the use of genetic programming for fitting already available value functions (Vfunctions) [18], [19], which, however, is completely different from our approach. In [18], authors use GP to find an algebraic expression that fits the sample points of the optimal value function, obtained via value iteration.…”

Section: Introductionmentioning

confidence: 99%

“…The closest related research is the use of genetic programming for fitting already available value functions (Vfunctions) [18], [19], which, however, is completely different from our approach. In [18], authors use GP to find an algebraic expression that fits the sample points of the optimal value function, obtained via value iteration. Contrary to [18], in [19] they use the fact that the so called threshold policy for the solved MDP is known a priori and they use GP to find a description of this threshold policy in terms of the MDP parameters.…”

Section: Introductionmentioning

confidence: 99%

“…In [18], authors use GP to find an algebraic expression that fits the sample points of the optimal value function, obtained via value iteration. Contrary to [18], in [19] they use the fact that the so called threshold policy for the solved MDP is known a priori and they use GP to find a description of this threshold policy in terms of the MDP parameters. In both cases, the task solved is to fit an algebraic expression to a set of data sampled from some known value or policy function.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Symbolic Regression Methods for Reinforcement Learning

Kubalı́k

Derner²,

Žegklitz

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Reinforcement learning algorithms can be used to optimally solve dynamic decision-making and control problems. With continuous-valued state and input variables, reinforcement learning algorithms must rely on function approximators to represent the value function and policy mappings. Commonly used numerical approximators, such as neural networks or basis function expansions, have two main drawbacks: they are black-box models offering no insight in the mappings learned, and they require significant trial and error tuning of their meta-parameters. In this paper, we propose a new approach to constructing smooth value functions in the form of analytic expressions by means of symbolic regression. We introduce three off-line methods for finding value functions based on a state transition model: symbolic value iteration, symbolic policy iteration, and a direct solution of the Bellman equation. The methods are illustrated on four nonlinear control problems: velocity control under friction, one-link and two-link pendulum swing-up, and magnetic manipulation. The results show that the value functions not only yield well-performing policies, but also are compact, mathematically tractable and easy to plug into other algorithms. This makes them potentially suitable for further analysis of the closed-loop system. A comparison with an alternative approach using neural networks shows that our method outperforms the neural network-based one.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Symbolic Regression Methods for Reinforcement Learning

Kubalı́k

Derner²,

Žegklitz

et al. 2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Recently, the value function approximation by GP becomes a hot spot in the evolutionary reinforcement learning domain. Reference [25] introduced a method to get a near-optimal value function on an illustrative Markov decision process (MDP) environment. Lately, [18] proposed an approach to calculate the symbolic value function in a policy iteration way.…”

Section: Related Workmentioning

confidence: 99%

Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis

Zhang

Zhou

Lin

2020

Complex Intell. Syst.

View full text Add to dashboard Cite

Reinforcement learning based on the deep neural network has attracted much attention and has been widely used in real-world applications. However, the black-box property limits its usage from applying in high-stake areas, such as manufacture and healthcare. To deal with this problem, some researchers resort to the interpretable control policy generation algorithm. The basic idea is to use an interpretable model, such as tree-based genetic programming, to extract policy from other black box modes, such as neural networks. Following this idea, in this paper, we try yet another form of the genetic programming technique, evolutionary feature synthesis, to extract control policy from the neural network. We also propose an evolutionary method to optimize the operator set of the control policy for each specific problem automatically. Moreover, a policy simplification strategy is also introduced. We conduct experiments on four reinforcement learning environments. The experiment results reveal that evolutionary feature synthesis can achieve better performance than tree-based genetic programming to extract policy from the neural network with comparable interpretability.

show abstract

“…For instance, in [6] a method called Value Function Discovery is proposed that uses GP to evolve algebraic description of the V-function. In [7] an evolutionary algorithm is used to accelerate the convergence of Q-tables.…”

Section: Introductionmentioning

confidence: 99%

Symbolic method for deriving policy in reinforcement learning

Alibekov

Kubalı́k

Babuška

2016

2016 IEEE 55th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Abstract-This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

show abstract

Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms

Cited by 10 publications

References 14 publications

Symbolic Regression Methods for Reinforcement Learning

Symbolic Regression Methods for Reinforcement Learning

Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis

Symbolic method for deriving policy in reinforcement learning

Contact Info

Product

Resources

About