2024
DOI: 10.1109/tnnls.2023.3250269
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems

Abstract: With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from pixel observations, sustaining conversations with humans, and controlling robotic agents. However, there is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment. Offline RL is a paradigm that learns exclusively from static datasets of previously … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 83 publications
(20 citation statements)
references
References 52 publications
0
20
0
Order By: Relevance
“…In offline reinforcement learning (offline RL), an agent must work with a set of static, pre-existing data rather than interacting with an environment to collect data. This data is often collected by unknown policies Prudencio et al (2022). In the offline goal-conditioned setting, the objective is the same as in online goal-conditioned RL, as defined by Equation 1.…”
Section: Preliminariesmentioning
confidence: 99%
See 2 more Smart Citations
“…In offline reinforcement learning (offline RL), an agent must work with a set of static, pre-existing data rather than interacting with an environment to collect data. This data is often collected by unknown policies Prudencio et al (2022). In the offline goal-conditioned setting, the objective is the same as in online goal-conditioned RL, as defined by Equation 1.…”
Section: Preliminariesmentioning
confidence: 99%
“…Training GCRL agents can be difficult due to the sparsity of rewards in GCRL tasks, forcing the agent to explore the environment, which can be unfeasible or even dangerous in some real-world tasks. To utilize RL without environment interactions offline RL allows learning a policy from a dataset without putting real environments at risk ; Prudencio et al (2022). Offline goal-conditioned RL (offline GCRL) combines the generalizability of GCRL and the data-efficiency of offline RL, making it a promising approach for real-world applications Ma et al (2022).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…It is evident that the operation of the integral eliminates the drift dynamics f (x), the utilization of u * i and l i j (x) eliminates the input dynamics g(x). Therefore, based on approximation methods such as neural networks, by solving Equation (10) for each player, the value function V i , Nash equilibrium policy u * i and unknown function l i j can be obtained without requiring any system dynamics. This is the model-free approach in [19] for obtaining the Nash equilibrium solution of CT nonlinear systems.…”
Section: Definition 1 (Admissible Control) a Feedback Policy Pairmentioning
confidence: 99%
“…Reinforcement learning (RL) has emerged as a popular approach for tackling NZS games. RL is grounded in the principle of trial and error [8], enabling agents to acquire optimal behavioural policies by leveraging feedback responses derived from the environment [9,10]. RL can be broadly categorized into two types: model-free and model-based, with the key distinction being the requirement of dynamic model information [11].…”
Section: Introductionmentioning
confidence: 99%