2023
DOI: 10.1609/aaai.v37i6.25852
|View full text |Cite
|
Sign up to set email alerts
|

Learning Pessimism for Reinforcement Learning

Abstract: Off-policy deep reinforcement learning algorithms commonly compensate for overestimation bias during temporal-difference learning by utilizing pessimistic estimates of the expected target returns. In this work, we propose Generalized Pessimism Learning (GPL), a strategy employing a novel learnable penalty to enact such pessimism. In particular, we propose to learn this penalty alongside the critic with dual TD-learning, a new procedure to estimate and minimize the magnitude of the target returns bias with triv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…Overestimation, a consequence of inaccurate action values, is underscored as a critical issue in the DDQN literature [13,[26][27][28]. Traditional LiDAR information incorporates an infinite range, which represents all information at the maximum distance or the value of obstacle-free spaces.…”
Section: Odg Dqnmentioning
confidence: 99%
“…Overestimation, a consequence of inaccurate action values, is underscored as a critical issue in the DDQN literature [13,[26][27][28]. Traditional LiDAR information incorporates an infinite range, which represents all information at the maximum distance or the value of obstacle-free spaces.…”
Section: Odg Dqnmentioning
confidence: 99%