2022
DOI: 10.48550/arxiv.2201.05433
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning

Abstract: Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists. We compare model-free, model-based, as well as hybrid offline RL approaches on various industrial benchmark (IB) datasets to test the algorithms in settings closer to real world problems, including complex noise and partially observable states. We find that on the IB, hybrid approaches face severe difficulties and that simpler algorithm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 31 publications
(33 reference statements)
0
3
0
Order By: Relevance
“…The goal is to learn a policy that can maximize E (s,a)∼ρ π T [r(s, a) − u(s, a)]. Existing uncertainty computations [32,39] only calculate the deviation during policy optimization without evaluating OOD generalization. Therefore, we propose an energy function to evaluate the exploration behavior through reward shaping.…”
Section: Energy-based Ood Generalization Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…The goal is to learn a policy that can maximize E (s,a)∼ρ π T [r(s, a) − u(s, a)]. Existing uncertainty computations [32,39] only calculate the deviation during policy optimization without evaluating OOD generalization. Therefore, we propose an energy function to evaluate the exploration behavior through reward shaping.…”
Section: Energy-based Ood Generalization Evaluationmentioning
confidence: 99%
“…The uncertainty of the current policy reduces the interference of extrapolation errors. However, existing uncertainty factors limit the behavior to offline datasets by estimating the model discrepancies that might overfit the limited and suboptimal offline datasets [32,35]. The agent is limited to the behavior policy of offline datasets and can not achieve tasks in OOD regions.…”
Section: Introductionmentioning
confidence: 99%
“…Figure 4: Evaluation performance and distance to the original policy of the LION approach over the chosen λ hyperparameter. Various state of the art baselines are added as dashed lines with their standard set of hyperparameters (results from (Swazinna et al, 2022)). Even though the baselines all exhibit some hyperparameter that controls the distance to the original policy, all are implemented differently and we can neither map them to a corresponding lambda value of our algorithm, nor change the behavior at runtime, which is why we display them as dashed lines over the entire λ spectrum.…”
Section: Industrial Benchmarkmentioning
confidence: 99%