Proceedings of IEEE 5th International Fuzzy Systems
DOI: 10.1109/fuzzy.1996.551807
|View full text |Cite
|
Sign up to set email alerts
|

Fuzzy interpolation-based Q-learning with continuous states and actions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
29
0
2

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(31 citation statements)
references
References 5 publications
0
29
0
2
Order By: Relevance
“…Online algorithms, mainly approximate versions of Q-learning, have been studied since the beginning of the nineties (Lin, 1992;Singh et al, 1995;Horiuchi et al, 1996;Jouffe, 1998;Glorennec, 2000;Tuyls et al, 2002;Szepesvári and Smart, 2004;Murphy, 2005;Sherstov and Stone, 2005;Melo et al, 2008). A strong research thread in offline model-free value iteration emerged later (Ormoneit and Sen, 2002;Ernst et al, 2005;Riedmiller, 2005;Szepesvári and Munos, 2005;Ernst et al, 2006b;Antos et al, 2008a;Munos and Szepesvári, 2008;Farahmand et al, 2009a).…”
Section: Model-free Value Iteration With Parametric Approximationmentioning
confidence: 99%
See 1 more Smart Citation
“…Online algorithms, mainly approximate versions of Q-learning, have been studied since the beginning of the nineties (Lin, 1992;Singh et al, 1995;Horiuchi et al, 1996;Jouffe, 1998;Glorennec, 2000;Tuyls et al, 2002;Szepesvári and Smart, 2004;Murphy, 2005;Sherstov and Stone, 2005;Melo et al, 2008). A strong research thread in offline model-free value iteration emerged later (Ormoneit and Sen, 2002;Ernst et al, 2005;Riedmiller, 2005;Szepesvári and Munos, 2005;Ernst et al, 2006b;Antos et al, 2008a;Munos and Szepesvári, 2008;Farahmand et al, 2009a).…”
Section: Model-free Value Iteration With Parametric Approximationmentioning
confidence: 99%
“…From the class of online algorithms for approximate value iteration, approximate versions of Q-learning are the most popular (Lin, 1992;Singh et al, 1995;Horiuchi et al, 1996;Jouffe, 1998;Glorennec, 2000;Tuyls et al, 2002;Szepesvári and Smart, 2004;Murphy, 2005;Sherstov and Stone, 2005;Melo et al, 2008). Recall from Section 2.3.2 that the original Q-learning updates the Q-function with (2.30):…”
Section: Online Model-free Approximate Value Iterationmentioning
confidence: 99%
“…Fuzzy approximators have typically been used in modelfree (RL) techniques such as Q-learning [13,15,17] and actor-critic algorithms [2,20]. Most of these approaches are heuristic in nature, and their theoretical properties have not been investigated yet.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Glorennec [4] proposed a fuzzy Q-learning algorithm for obtaining the optimal rule base for a fuzzy controller. Horiuchi et al [5] proposed a fuzzy interpolation-based Q-learning where a fuzzy rule base is used to approximate the distribution of Q-values over a continuous action space. In [5], action selection was performed by calculating Q-values for several discrete actions and then selecting one action through the roulette wheel selection scheme.…”
Section: Introductionmentioning
confidence: 99%