2019
DOI: 10.48550/arxiv.1912.08664
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…For the task of obtaining diamond in MineRL (hierarhical set), we demonstrate the ForgER ability to use extracted subtask graph G and use it for goal oriented ogranization of experience replay and data augmentation. In a crucial experiment, we compare our ForgER approach, ForgER++ modication (a heuristically modified hierarchy of subtasks G) with the best MineRL competition solution [26].…”
Section: Hierarchical Settingmentioning
confidence: 99%
See 1 more Smart Citation
“…For the task of obtaining diamond in MineRL (hierarhical set), we demonstrate the ForgER ability to use extracted subtask graph G and use it for goal oriented ogranization of experience replay and data augmentation. In a crucial experiment, we compare our ForgER approach, ForgER++ modication (a heuristically modified hierarchy of subtasks G) with the best MineRL competition solution [26].…”
Section: Hierarchical Settingmentioning
confidence: 99%
“…In this regard, we have applied our method in the well-known complex Minecraft environment, which has recently served as a good benchmark for testing RL algorithms in rich hierarchical settings [7,25]. This allowed us to demonstrate the performance of ForgER and surpass the results of recent SOTA methods in MineRL competition [26].…”
Section: Introductionmentioning
confidence: 99%
“…The top team, CDS, used a hierarchical deep Q-network with forgetting that uses an adaptive ratio for sampling expert demonstrations from a separate demonstration replay buffer 13 (Skrynnik et al, 2019). The second-place team, mc rl, trained their hierarchical policies entirely from human demonstrations with no environment interactions 14 .…”
Section: Summary Of Top Nine Solutionsmentioning
confidence: 99%