Fuzzy interpolation-based Q-learning with continuous states and actions

Horiuchi, Tadashi; Fujino, Akinori; Katai, Osamu; Sawaragi, Tetsuo

doi:10.1109/fuzzy.1996.551807

Cited by 49 publications

(31 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Online algorithms, mainly approximate versions of Q-learning, have been studied since the beginning of the nineties (Lin, 1992;Singh et al, 1995;Horiuchi et al, 1996;Jouffe, 1998;Glorennec, 2000;Tuyls et al, 2002;Szepesvári and Smart, 2004;Murphy, 2005;Sherstov and Stone, 2005;Melo et al, 2008). A strong research thread in offline model-free value iteration emerged later (Ormoneit and Sen, 2002;Ernst et al, 2005;Riedmiller, 2005;Szepesvári and Munos, 2005;Ernst et al, 2006b;Antos et al, 2008a;Munos and Szepesvári, 2008;Farahmand et al, 2009a).…”

Section: Model-free Value Iteration With Parametric Approximationmentioning

confidence: 99%

“…From the class of online algorithms for approximate value iteration, approximate versions of Q-learning are the most popular (Lin, 1992;Singh et al, 1995;Horiuchi et al, 1996;Jouffe, 1998;Glorennec, 2000;Tuyls et al, 2002;Szepesvári and Smart, 2004;Murphy, 2005;Sherstov and Stone, 2005;Melo et al, 2008). Recall from Section 2.3.2 that the original Q-learning updates the Q-function with (2.30):…”

Section: Online Model-free Approximate Value Iterationmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning and Dynamic Programming Using Function Approximators

Buşoniu¹,

Babuška²,

Schutter³

et al. 2017

554

398

View full text Add to dashboard Cite

Control systems are making a tremendous impact on our society. Though invisible to most users, they are essential for the operation of nearly all devices -from basic home appliances to aircraft and nuclear power plants. Apart from technical systems, the principles of control are routinely applied and exploited in a variety of disciplines such as economics, medicine, social sciences, and artificial intelligence.A common denominator in the diverse applications of control is the need to influence or modify the behavior of dynamic systems to attain prespecified goals. One approach to achieve this is to assign a numerical performance index to each state trajectory of the system. The control problem is then solved by searching for a control policy that drives the system along trajectories corresponding to the best value of the performance index. This approach essentially reduces the problem of finding good control policies to the search for solutions of a mathematical optimization problem.Early work in the field of optimal control dates back to the 1940s with the pioneering research of Pontryagin and Bellman. Dynamic programming (DP), introduced by Bellman, is still among the state-of-the-art tools commonly used to solve optimal control problems when a system model is available. The alternative idea of finding a solution in the absence of a model was explored as early as the 1960s. In the 1980s, a revival of interest in this model-free paradigm led to the development of the field of reinforcement learning (RL). The central theme in RL research is the design of algorithms that learn control policies solely from the knowledge of transition samples or trajectories, which are collected beforehand or by online interaction with the system. Most approaches developed to tackle the RL problem are closely related to DP algorithms.A core obstacle in DP and RL is that solutions cannot be represented exactly for problems with large discrete state-action spaces or continuous spaces. Instead, compact representations relying on function approximators must be used. This challenge was already recognized while the first DP techniques were being developed. However, it has only been in recent years -and largely in correlation with the advance of RL -that approximation-based methods have grown in diversity, maturity, and efficiency, enabling RL and DP to scale up to realistic problems.This book provides an accessible in-depth treatment of reinforcement learning and dynamic programming methods using function approximators. We start with a concise introduction to classical DP and RL, in order to build the foundation for the remainder of the book. Next, we present an extensive review of state-of-the-art approaches to DP and RL with approximation. Theoretical guarantees are provided on the solutions obtained, and numerical examples and comparisons are used to illustrate the properties of the individual methods. The remaining three chapters are i ii dedicated to a detailed presentation of representative algorithms from the three major classes o...

show abstract

Section: Model-free Value Iteration With Parametric Approximationmentioning

confidence: 99%

Section: Online Model-free Approximate Value Iterationmentioning

confidence: 99%

Reinforcement Learning and Dynamic Programming Using Function Approximators

Buşoniu¹,

Babuška²,

Schutter³

et al. 2017

554

398

View full text Add to dashboard Cite

show abstract

“…Fuzzy approximators have typically been used in modelfree (RL) techniques such as Q-learning [13,15,17] and actor-critic algorithms [2,20]. Most of these approaches are heuristic in nature, and their theoretical properties have not been investigated yet.…”

Section: Related Workmentioning

confidence: 99%

Approximate dynamic programming with a fuzzy parameterization

et al. 2010

View full text Add to dashboard Cite

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem.

show abstract

“…For example, Glorennec [4] proposed a fuzzy Q-learning algorithm for obtaining the optimal rule base for a fuzzy controller. Horiuchi et al [5] proposed a fuzzy interpolation-based Q-learning where a fuzzy rule base is used to approximate the distribution of Q-values over a continuous action space. In [5], action selection was performed by calculating Q-values for several discrete actions and then selecting one action through the roulette wheel selection scheme.…”

Section: Introductionmentioning

confidence: 99%

A Fuzzy Reinforcement Learning for a Ball Interception Problem

Nakashima

Udo

Ishibuchi

2004

RoboCup 2003: Robot Soccer World Cup VII

View full text Add to dashboard Cite

Abstract. In this paper, we propose a reinforcement learning method called a fuzzy Q-learning where an agent determines its action based on the inference result by a fuzzy rule-based system. We apply the proposed method to a soccer agent that intercepts a passed ball by another agent. In the proposed method, the state space is represented by internal information the learning agent maintains such as the relative velocity and the relative position of the ball to the learning agent. We divide the state space into several fuzzy subspaces. A fuzzy if-then rule in the proposed method represents a fuzzy subspace in the state space. The consequent part of the fuzzy if-then rules is a motion vector that suggests the moving direction and velocity of the learning agent. A reward is given to the learning agent if the distance between the ball and the agent becomes smaller or if the agent catches up with the ball. It is expected that the learning agent finally obtains the efficient positioning skill.

show abstract

Fuzzy interpolation-based Q-learning with continuous states and actions

Cited by 49 publications

References 5 publications

Reinforcement Learning and Dynamic Programming Using Function Approximators

Reinforcement Learning and Dynamic Programming Using Function Approximators

Approximate dynamic programming with a fuzzy parameterization

A Fuzzy Reinforcement Learning for a Ball Interception Problem

Contact Info

Product

Resources

About