1997
DOI: 10.1002/(sici)1098-111x(199710)12:10<695::aid-int1>3.0.co;2-t
|View full text |Cite
|
Sign up to set email alerts
|

Training and delayed reinforcements in Q-learning agents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2001
2001
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(6 citation statements)
references
References 17 publications
0
5
0
1
Order By: Relevance
“…The actor-critic algorithm is much simpler than Q-learning in the computation, 19 can determine the optimal policy, and effectively applied to control the tasks. 20,21 Markov chain model for investment…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…The actor-critic algorithm is much simpler than Q-learning in the computation, 19 can determine the optimal policy, and effectively applied to control the tasks. 20,21 Markov chain model for investment…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…A good solution to this problem might be to exploit both mechanisms, in order to get the best out of each. Even if a few steps in this direction have already been made ( [5,26]), no final solution has been proposed yet.…”
Section: About Learning and Trainingmentioning
confidence: 99%
“…A second example of the application of the BAT methodology is HAMSTER, a mobile robot based on a commercial platform, whose task is to bring "food" to its "nest." 5 We shall not describe all steps in the development of this robot, but confine ourselves to the description of the main differences with respect to AM.…”
Section: Case 2: Hamstermentioning
confidence: 99%
“…We try to improve ACS technique by adding Q learning [9] concept. This Ant-Q method can solve SMP problem for various requirements.…”
Section: Introductionmentioning
confidence: 99%
“…(1,10), (2,8), (3,5), (4,2),(5,9),(6,7), (7,1), (8,4), (9,6), (10,3)   (1,6), (2,5), ((3,3), (4,10), (5,9), (6,4), (7,1),(8,7), (9,8), (10,2) (그림 2) 안정된 매칭의 결과들  는 수식(6)(7)과는 조금 다르다. …”
unclassified