2020
DOI: 10.48550/arxiv.2002.02794
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reward-Free Exploration for Reinforcement Learning

Abstract: Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new "reward-free RL" framework. In the exploration phase, the agent first collects trajectories from an MDP M without a pre-specified reward function. After exploration, it is tasked with computing near-optimal policies under for M for a collection of given reward functions. This frame… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
73
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(77 citation statements)
references
References 5 publications
4
73
0
Order By: Relevance
“…Therefore, an important future direction is to generalize our method to reward-free settings (e.g. Jin et al, 2020).…”
Section: Conclusion Limitations and Social Impactsmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, an important future direction is to generalize our method to reward-free settings (e.g. Jin et al, 2020).…”
Section: Conclusion Limitations and Social Impactsmentioning
confidence: 99%
“…Reward Free RL Recent works on reward (task) free RL (e.g. Jin et al, 2020;Zhang et al, 2020c) break reinforcement learning into two steps: exploration phase and planning phase. In the exploration phase, they don't know the true reward functions and only focus on collecting data by exploration strategy.…”
Section: More Related Workmentioning
confidence: 99%
“…For usual MDPs, states that cannot be reached pose no problems: we do not need to learn transition or reward probabilities of these states, since no optimal policy will use that state. Indeed, with this observation at hand, reward free exploration techniques [31,33] can be utilized to learn a good model w.r.t. all possible reward functions (given a fixed initial distribution).…”
Section: Notationmentioning
confidence: 99%
“…In contrast to fully observable environments, very little is known about the exploration in partially observable MDPs (POMDPs). In general, RL in POMDPs may require an exponential number of samples (without simplifying structural assumptions) [35,31]. Therefore, it is important to consider natural sub-classes of POMDPs which admit tractable solutions.…”
Section: Introductionmentioning
confidence: 99%
“…Unsupervised exploration is an emergent and challenging topic for reinforcement learning (RL) that inspires research interests in both application [Riedmiller et al, 2018, Finn and Levine, 2017, Xie et al, 2018, Schaul et al, 2015, Riedmiller et al, 2018 and theory [Hazan et al, 2018, Jin et al, 2020, Zhang et al, 2020a, Zhang et al, 2020b, Wu et al, 2020, Wang et al, 2020b. The formal formulation of an unsupervised RL problem consists of an exploration phase and a planning phase [Jin et al, 2020]: in the exploration phase, an agent interacts with the unknown environment without the supervision of reward signals; then in the planning phase, the agent is prohibited to interact with the environment, and is required to compute a nearly optimal policy for some revealed reward function based on its exploration experiences. In particular, if the reward function is fixed yet unknown during exploration, the problem is called task-agnostic exploration (TAE) [Zhang et al, 2020a], and if the reward function is allowed to be chosen arbitrary, the problem is called reward-free exploration (RFE) [Jin et al, 2020].…”
Section: Introductionmentioning
confidence: 99%