2018
DOI: 10.1109/access.2018.2847048
|View full text |Cite
|
Sign up to set email alerts
|

A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…However, solely relying on interactions with the real world is sometimes inefficient. Inspired by the Dyna structure [28], the historical experience may be able to provide more guidance for learning by generating virtual experience. The Dyna structure is further explained in Appendix A.…”
Section: M-carla Algorithmmentioning
confidence: 99%
“…However, solely relying on interactions with the real world is sometimes inefficient. Inspired by the Dyna structure [28], the historical experience may be able to provide more guidance for learning by generating virtual experience. The Dyna structure is further explained in Appendix A.…”
Section: M-carla Algorithmmentioning
confidence: 99%
“…In order to control the randomness of action selection, a simulated annealing (SA) algorithm [23,24] is used to optimize softmax function. The softmax function is a method for balancing Exploration and exploitation [25] in the RL method, which chooses the action according to the average reward of each action, and the probability of the action a t being chosen is higher if the average reward produced by the action is higher than the average reward produced by the other action.…”
Section: Softmax Function Based On Simulated Annealingmentioning
confidence: 99%
“…The crowd would be out of control effortlessly when density exceeds a certain threshold, whereas public security would be threatened seriously. The evacuation path is scheduled to condense the evacuation time in thickly populated areas, especially complex environments with obstacles and multiple path is one of the significant issues of crowd simulations caused by emergency tragedies [1]. In such a condition, most pedestrians will track to the nearer exit or follow the crowd to reach exit [2], the above will cause overcrowding delay, trampling mortalities and other safety incidents.…”
Section: Introductionmentioning
confidence: 99%