2022
DOI: 10.48550/arxiv.2204.12581
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

Abstract: Offline reinforcement learning (RL) aims to find near-optimal policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. To achieve conservatism, we formulate the problem as a two-player ze… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(17 citation statements)
references
References 17 publications
0
17
0
Order By: Relevance
“…Perturbing the training environment was previously discussed in unsupervised environment design [6,17], domain randomization [27,36], robust adversarial RL [28,33] and risk aversion [10,25]. However, their focus on robustness differs from our perspective.…”
Section: Related Workmentioning
confidence: 99%
“…Perturbing the training environment was previously discussed in unsupervised environment design [6,17], domain randomization [27,36], robust adversarial RL [28,33] and risk aversion [10,25]. However, their focus on robustness differs from our perspective.…”
Section: Related Workmentioning
confidence: 99%
“…The second category consists of model-based methods such as adversarial model learning (Rigter et al, 2022), learning pessimistic models (Kidambi et al, 2020;Guo et al, 2022), using model ensembles to form penalties (Yu et al, 2020), or combining model and values .…”
Section: Conservative Offline Reinforcement Learningmentioning
confidence: 99%
“…We additionally compared the proposed approach with RAMBO [61], a concurrent work that also formulates offline RL as a two-player zero-sum game. The results of RAMBO for random, medium, medium-expert and medium-replay are taken from [61].…”
Section: K Comparison With Rambomentioning
confidence: 99%
“…We additionally compared the proposed approach with RAMBO [61], a concurrent work that also formulates offline RL as a two-player zero-sum game. The results of RAMBO for random, medium, medium-expert and medium-replay are taken from [61]. For the other two dataset types, we run the official code and follow the hyperparameter search procedure reported in its paper.…”
Section: K Comparison With Rambomentioning
confidence: 99%