2015 Brazilian Conference on Intelligent Systems (BRACIS) 2015
DOI: 10.1109/bracis.2015.62
|View full text |Cite
|
Sign up to set email alerts
|

Dyna-MLAC: Trading Computational and Sample Complexities in Actor-Critic Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…Enormous samples are still required when only using such a process model to update the policy gradient. Afterward, Costa et al [ 26 ] derived an AC algorithm by introducing Dyna structure called Dyna-MLAC which approximated the value function, the policy, and the model by LLR as MLAC did. The difference is that Dyna-MLAC applies the model not only in updating the policy gradient but also in planning [ 27 ].…”
Section: Introduction and Related Workmentioning
confidence: 99%
“…Enormous samples are still required when only using such a process model to update the policy gradient. Afterward, Costa et al [ 26 ] derived an AC algorithm by introducing Dyna structure called Dyna-MLAC which approximated the value function, the policy, and the model by LLR as MLAC did. The difference is that Dyna-MLAC applies the model not only in updating the policy gradient but also in planning [ 27 ].…”
Section: Introduction and Related Workmentioning
confidence: 99%