2011
DOI: 10.1016/j.ins.2011.02.017
|View full text |Cite
|
Sign up to set email alerts
|

Self-organizing state aggregation for architecture design of Q-learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…The above process makes the learning strategy of RL have long-term effects. Q-learning [6,19] is one of the famous RL approaches. Its goal is for an agent to learn the optimal long-term expected reward value Q(s, a) for each pair of state (s) and action (a).…”
Section: Background and Related Work 21 Memetic Algorithms And Q-leamentioning
confidence: 99%
“…The above process makes the learning strategy of RL have long-term effects. Q-learning [6,19] is one of the famous RL approaches. Its goal is for an agent to learn the optimal long-term expected reward value Q(s, a) for each pair of state (s) and action (a).…”
Section: Background and Related Work 21 Memetic Algorithms And Q-leamentioning
confidence: 99%
“…As a result, developing some basic behaviors before constructing the proposed system is required. Imitation is one method for humans learning new behaviors and this idea inspired the reinforcement learning based (RL-based) decision tree (RLbased DT) proposed in our previous work [22]. First, the instructor controls the robot based on his/her instincts or experience, with the robot recording the patterns from its sensory inputs, denoted as attributes, and the robot's outputs, denoted as classes.…”
Section: A Individual Behavior Recorded By a Reinforcement Learning mentioning
confidence: 99%
“…Recently, RL methods have gained wide popularity in adaptive control problems with successful applications [14][15][16][17][18][19]. In engineering applications, RL is known as a bio-inspired machine learning technique used to solve sequential decision problems [20][21][22].…”
Section: Introductionmentioning
confidence: 99%