2012
DOI: 10.1016/j.knosys.2011.09.008
|View full text |Cite
|
Sign up to set email alerts
|

Dyna-: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 47 publications
(18 citation statements)
references
References 25 publications
0
18
0
Order By: Relevance
“…Models using simulated experience [c.f. Sutton and Barto (1998)] to improve valuations of explicitly represented system states exist in the form of reinforcement learning Dyna-based algorithms [e.g., Santos et al (2012) and Lowe and Ziemke (2013)]. However, these algorithms are limited to updating (either randomly or heuristically) already experienced states and do not simulate novel paths.…”
Section: Discussionmentioning
confidence: 99%
“…Models using simulated experience [c.f. Sutton and Barto (1998)] to improve valuations of explicitly represented system states exist in the form of reinforcement learning Dyna-based algorithms [e.g., Santos et al (2012) and Lowe and Ziemke (2013)]. However, these algorithms are limited to updating (either randomly or heuristically) already experienced states and do not simulate novel paths.…”
Section: Discussionmentioning
confidence: 99%
“…In 'LunarLanderContinuous-v2', state vector consist of 8 features which we used only 4 of them in cloud control and fuzzy control systems, these are: the horizontal and vertical coordinates s 1 and s 2 , the vertical velocity s 4 and the angle s 5 . Action is two real values vector.…”
Section: Lunarlandercontinuous-v2mentioning
confidence: 99%
“…Bianchi et al [4] also proved the convergence of heuristic exploration and the boundary of error estimation in theoretical analysis. Besides, Santo et al [5] proposed Dyna-H algorithm based on Dyna-Q framework, using the A* algorithm as a heuristic function, providing guidance for the planning part of the algorithm. Miyazaki [6] proposed the k-certainty exploration method which builds a maximum likelihood model of the state transition probabilities to guide action selection.…”
Section: Introductionmentioning
confidence: 99%
“…Data Tamer [27] in MIT, which deals even with data consolidation, is a good example. And not only tools for monitoring, integrate, store or analyze data but also for organizing e-learning [12,25]. All of these tools were working over a server or a huge system that need a very important first investment.…”
Section: Big Data Systems In Health and Social Sciencesmentioning
confidence: 99%