2021
DOI: 10.48550/arxiv.2110.13523
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automating Control of Overestimation Bias for Reinforcement Learning

Abstract: Bias correction techniques are used by most of the high-performing methods for off-policy reinforcement learning. However, these techniques rely on a pre-defined bias correction policy that is either not flexible enough or requires environment-specific tuning of hyperparameters. In this work, we present a simple data-driven approach for guiding bias correction. We demonstrate its effectiveness on the Truncated Quantile Critics -a stateof-the-art continuous control algorithm. The proposed technique can adjust t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…Weighted bellman backup (Lee et al, 2021 ) and uncertainty-weighted actor-critic (Wu et al, 2021 ) prevent error propagation (Kumar et al, 2020 ) in Q-learning by reweighing sample transitions based on uncertainty estimations from the ensembles (Lee et al, 2021 ) or Monte Carlo dropout (Wu et al, 2021 ). AdaTQC (Kuznetsov et al, 2021 ) proposed an auto mechanism for controlling overestimation bias. Unlike prior works, our work does not reweight sample transitions but directly adds uncertainty estimations to punish the target value.…”
Section: Related Workmentioning
confidence: 99%
“…Weighted bellman backup (Lee et al, 2021 ) and uncertainty-weighted actor-critic (Wu et al, 2021 ) prevent error propagation (Kumar et al, 2020 ) in Q-learning by reweighing sample transitions based on uncertainty estimations from the ensembles (Lee et al, 2021 ) or Monte Carlo dropout (Wu et al, 2021 ). AdaTQC (Kuznetsov et al, 2021 ) proposed an auto mechanism for controlling overestimation bias. Unlike prior works, our work does not reweight sample transitions but directly adds uncertainty estimations to punish the target value.…”
Section: Related Workmentioning
confidence: 99%
“…More related to our method, Tactical Optimism and Pessimism (Moskovitz et al 2021) introduced the concept of adapting a bias penalty online. Together with similar later work (Kuznetsov et al 2021), they proposed step-wise updates to the bias correction parameters based on the performance of recent trajectories. Instead, GPL proposes a new method to precisely estimate bias and reduce its magnitude via dual gradient descent.…”
Section: Related Workmentioning
confidence: 99%