Fuzzy Q-Learning with an adaptive representation

Waldock, Antony; Carse, Brian

doi:10.1109/fuzzy.2008.4630449

Cited by 12 publications

(8 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such methods have been proposed, e.g., for Q-learning (Reynolds, 2000;Ratitch and Precup, 2004;Waldock and Carse, 2008), V-iteration (Munos and Moore, 2002), and Q-iteration (Munos, 1997;Uther and Veloso, 1998).…”

Section: Basis Function Refinementmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning and Dynamic Programming Using Function Approximators

Buşoniu¹,

Babuška²,

Schutter³

et al. 2017

554

398

View full text Add to dashboard Cite

Control systems are making a tremendous impact on our society. Though invisible to most users, they are essential for the operation of nearly all devices -from basic home appliances to aircraft and nuclear power plants. Apart from technical systems, the principles of control are routinely applied and exploited in a variety of disciplines such as economics, medicine, social sciences, and artificial intelligence.A common denominator in the diverse applications of control is the need to influence or modify the behavior of dynamic systems to attain prespecified goals. One approach to achieve this is to assign a numerical performance index to each state trajectory of the system. The control problem is then solved by searching for a control policy that drives the system along trajectories corresponding to the best value of the performance index. This approach essentially reduces the problem of finding good control policies to the search for solutions of a mathematical optimization problem.Early work in the field of optimal control dates back to the 1940s with the pioneering research of Pontryagin and Bellman. Dynamic programming (DP), introduced by Bellman, is still among the state-of-the-art tools commonly used to solve optimal control problems when a system model is available. The alternative idea of finding a solution in the absence of a model was explored as early as the 1960s. In the 1980s, a revival of interest in this model-free paradigm led to the development of the field of reinforcement learning (RL). The central theme in RL research is the design of algorithms that learn control policies solely from the knowledge of transition samples or trajectories, which are collected beforehand or by online interaction with the system. Most approaches developed to tackle the RL problem are closely related to DP algorithms.A core obstacle in DP and RL is that solutions cannot be represented exactly for problems with large discrete state-action spaces or continuous spaces. Instead, compact representations relying on function approximators must be used. This challenge was already recognized while the first DP techniques were being developed. However, it has only been in recent years -and largely in correlation with the advance of RL -that approximation-based methods have grown in diversity, maturity, and efficiency, enabling RL and DP to scale up to realistic problems.This book provides an accessible in-depth treatment of reinforcement learning and dynamic programming methods using function approximators. We start with a concise introduction to classical DP and RL, in order to build the foundation for the remainder of the book. Next, we present an extensive review of state-of-the-art approaches to DP and RL with approximation. Theoretical guarantees are provided on the solutions obtained, and numerical examples and comparisons are used to illustrate the properties of the individual methods. The remaining three chapters are i ii dedicated to a detailed presentation of representative algorithms from the three major classes o...

show abstract

Section: Basis Function Refinementmentioning

confidence: 99%

“…• when the value function is not (approximately) constant in that region (Munos and Moore, 2002;Waldock and Carse, 2008);…”

Section: Basis Function Refinementmentioning

confidence: 99%

Reinforcement Learning and Dynamic Programming Using Function Approximators

Buşoniu¹,

Babuška²,

Schutter³

et al. 2017

554

398

View full text Add to dashboard Cite

show abstract

“…Each input variable has 3 MFs with a total of 6 MFs. Each MF, as defined by (12), has 2 parameters to be tuned. These parameters are the standard deviation, (J, and the mean, m. The total number of input parameters to be tuned is 12 parameters.…”

Section: Fuzzy Logic Controllermentioning

confidence: 99%

“…In the Q-Iearning algorithm, the state and action spaces are discrete and their corresponding value function is stored in a what is known as a Q-table. To use Q-Iearning with continuous systems (continuous state and action spaces), one can discretize the state and action spaces [11] or use some types of function approximations such as fuzzy systems [12], neural networks [13], or use some types of optimization techniques such as GAs [14]. A one-step update rule for Q-Iearning is defined as (6) where a is the learning rate, (0 < a :::; 1) and f::, t is the temporal difference error (TD-error) defined as f::, t = rt+1 +-y m � Qt(st+l ,a ) -Qt(st , a t)…”

Section: A Reinforcement Learningmentioning

confidence: 99%

Learning in n-pursuer n-evader differential games

Desouky

Schwartz

2010

2010 IEEE International Conference on Systems, Man and Cybernetics

View full text Add to dashboard Cite

This paper discusses learning in n-purser n-evader games. In a pursuit-evasion game, one player (the pursuer) pursues another one while the other (the evader) tries to escape.We assume that each player only knows the instantaneous position of the other players but at the same time none of them knows its control strategy nor the control strategy of the other players.Therefore, the players have to self-learn their control strategies on-line by interaction with each other. In this paper, we extend our previous work from learning in a single pursuit-evasion game [1J to learning in a multi-pursuit-evasion game. We use the Q(,X) learning based genetic fuzzy controller technique (QLBGFC) proposed in [1J. The proposed technique combines reinforcement learning with both a fuzzy controller and genetic algorithms in a two-phase structure. In addition to the proposed QLBGFC, we construct a new Q-table that is responsible for learning the coupling process between the pursuers and the evaders. To test the performance of the proposed technique, it is compared with the optimal strategy of a single pursuit-evasion game. Computer simulations show the usefulness of the proposed technique.

show abstract

“…The state and action spaces are discrete and their corresponding value function is stored in what is known as a Q-table. To use Q-learning with continuous systems (continuous state and action spaces), one can discretize the state and action spaces [21,[31][32][33][34][35][36] or use some type of function approximation such as FISs [26][27][28], NNs [12,19,37], or use some type of optimization technique such as genetic algorithms [38,39].…”

Section: Reinforcement Learningmentioning

confidence: 99%

Q(λ)-learning adaptive fuzzy logic controllers for pursuit-evasion differential games

Desouky

Schwartz

2011

Int. J. Adapt. Control Signal Process.

View full text Add to dashboard Cite

This paper addresses the problem of tuning the input and the output parameters of a fuzzy logic controller. A novel technique that combines Q( )-learning with function approximation (fuzzy inference system) is proposed. The system learns autonomously without supervision or a priori training data. The proposed technique is applied to three different pursuit-evasion differential games. The proposed technique is compared with the classical control strategy, Q( )-learning only, and the technique proposed by Dai et al. (2005) in which a neural network is used as a function approximation for Q-learning. Computer simulations show the usefulness of the proposed technique.

show abstract

Fuzzy Q-Learning with an adaptive representation

Cited by 12 publications

References 9 publications

Reinforcement Learning and Dynamic Programming Using Function Approximators

Reinforcement Learning and Dynamic Programming Using Function Approximators

Learning in n-pursuer n-evader differential games

Q(λ)-learning adaptive fuzzy logic controllers for pursuit-evasion differential games

Contact Info

Product

Resources

About