2021
DOI: 10.48550/arxiv.2106.10268
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MADE: Exploration via Maximizing Deviation from Explored Regions

Abstract: In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. In low-dimensional environments, where tabular parameterization is possible, count-based upper confidence bound (UCB) exploration methods achieve minimax near-optimal rates. However, it remains unclear how to efficiently implement UCB in realistic RL tasks that involve non-linear function approximation. To address this, we propose a new exploration approach via max… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 62 publications
0
1
0
Order By: Relevance
“…Effectively solving goal-conditioned RL problems requires performing good exploration. In the goal-conditioned setting, the quality of exploration depends on how goals are sampled, a problem studied in many prior methods [6,10,11,27,28,30,38,39]. These methods craft objectives that try to optimize for learning progress, and the resulting algorithms achieve good results across a range of environments.…”
Section: Related Workmentioning
confidence: 99%
“…Effectively solving goal-conditioned RL problems requires performing good exploration. In the goal-conditioned setting, the quality of exploration depends on how goals are sampled, a problem studied in many prior methods [6,10,11,27,28,30,38,39]. These methods craft objectives that try to optimize for learning progress, and the resulting algorithms achieve good results across a range of environments.…”
Section: Related Workmentioning
confidence: 99%