2024
DOI: 10.1109/tnnls.2022.3217189
|View full text |Cite
|
Sign up to set email alerts
|

Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(21 citation statements)
references
References 20 publications
0
21
0
Order By: Relevance
“…This limits the applicability of pessimism-based, provably efficient offline RL to practical settings. A very recent work Bai et al (2022) estimates the uncertainty for constructing LCB via the disagreement of bootstrapped Q-functions. However, the uncertainty quantifier is only guaranteed in linear MDPs and must be computed explicitly.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…This limits the applicability of pessimism-based, provably efficient offline RL to practical settings. A very recent work Bai et al (2022) estimates the uncertainty for constructing LCB via the disagreement of bootstrapped Q-functions. However, the uncertainty quantifier is only guaranteed in linear MDPs and must be computed explicitly.…”
Section: Related Workmentioning
confidence: 99%
“…The penalization term R(θ i ; s, φ) discourages overestimation in the value function estimate Q θ i for out-of-distribution (OOD) actions a ∼ π φ (•|s). Our design of R(θ i ; s, φ) is initially inspired by the OOD penalization in Bai et al (2022) that creates a pessimistic pseudo target for the values at OOD actions. Note that we do not need any penalization for OOD actions in our experiment for contextual bandits in Section 6.2.…”
Section: A3 Experiments In D4rl Benchmarkmentioning
confidence: 99%
See 3 more Smart Citations