2021 60th IEEE Conference on Decision and Control (CDC) 2021
DOI: 10.1109/cdc45484.2021.9683288
|View full text |Cite
|
Sign up to set email alerts
|

FORK: A FORward-looKing Actor for Model-Free Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 7 publications
1
4
0
Order By: Relevance
“…Agents are trained with three different seeds for 3K episodes, and the one with the best average reward is selected as the baseline. This agent reaches an average over the last 100 episodes above 300+ in about 0.5M steps similar to the results reported in Wei & Ying (2021). All the network weights are transferred to the retrained agent.…”
Section: Lunarlandercontinuoussupporting
confidence: 63%
See 1 more Smart Citation
“…Agents are trained with three different seeds for 3K episodes, and the one with the best average reward is selected as the baseline. This agent reaches an average over the last 100 episodes above 300+ in about 0.5M steps similar to the results reported in Wei & Ying (2021). All the network weights are transferred to the retrained agent.…”
Section: Lunarlandercontinuoussupporting
confidence: 63%
“…We use SAC-I to learn as well as to weight that negative reward. The BipedalWalkerHardcore-v3 is a challenging task, known to be unsolvable for many of the simpler non-recurrent DRL architectures or model-free RL methods (Wei & Ying, 2021). We solve the task using a standard two-layer dense-network architecture, and adopt two strategies: removing the fall-penalty and creating a cumulative version of the task reward.…”
Section: Bipedalwalkerhardcore-v3mentioning
confidence: 99%
“…AI‐EDGE researchers have developed several innovative algorithms to handle such constraints while maintaining the best possible performance. For example, Wei, Liu, and Ying (2022) developed model‐free RL algorithms with optimal regret and constraint violation guarantees. Ghosh, Zhou, and Shroff (2022) further expanded our exploration to larger, potentially infinite, state spaces via a new model‐free constrained RL algorithm that also enjoys near‐optimal performance bounds.…”
Section: Research Themes Of Ai For Networkmentioning
confidence: 99%
“…Concludingly, ( 5) is of precisely quadratic order in the monomials of the input vector. Our guiding principle is to reduce the number of weights compared to the three layer actor policy MLP network from [37] i.e. two hidden layers with hidden unit dimension 256 each.…”
Section: Q-mlp -Quadratic-multi-layer-perceptronmentioning
confidence: 99%
“…The solid curves in all figures shows the average cumulative rewards per episode, and the shaded region represents the standard deviations. The implementation of the TD3 and SAC algorithms as well as hyper parameter settings are taken from [37] 5 . The latter is closely based on [8] 6 in the case of the TD3 algorithm and [40] 7 for the SAC algorithm.…”
Section: A Training Environments and Rl Algorithmmentioning
confidence: 99%