2018
DOI: 10.48550/arxiv.1805.04514
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…However, this method and its variations were limited to linear supervised learning. The recent movement is to combine incremental Delta-Bar-Delta method and Temporal-difference learning [28]. Those methods can not make tuning hyperparameters online and allow the algorithm to more robustly adjust to non-stationarity in a problem at the same time.…”
Section: Related Workmentioning
confidence: 99%
“…However, this method and its variations were limited to linear supervised learning. The recent movement is to combine incremental Delta-Bar-Delta method and Temporal-difference learning [28]. Those methods can not make tuning hyperparameters online and allow the algorithm to more robustly adjust to non-stationarity in a problem at the same time.…”
Section: Related Workmentioning
confidence: 99%
“…Meta Reinforcement Learning Some methods are using the meta-objective (usually the difference of episode return ) for reinforcement learning. Meta-knowledge is used to construct loss Zhou et al (2020b); Veeriah et al (2019), reward Jaderberg et al (2019); Zheng et al (2018), or hyperparameters Xu et al (2018b); Young et al (2018). In Xu et al (2018a) a teacher-student strategy is proposed for global exploration.…”
Section: Ablation Studymentioning
confidence: 99%
“…This algorithm used no experience replay or multiple parallel actors, which we refer to as the incremental-online setting. AC(λ) has been previously applied to the ALE in the work of Young, Wang, and Taylor (2018). For AC(λ), we used a similar architecture to the one used in our DQN experiments, except that we replaced the ReLU activation functions with the SiLU and dSiLU activation functions introduced by Elfwing, Uchibe, and Doya (2018).…”
Section: Actor-critic With Eligibility Tracesmentioning
confidence: 99%