2022
DOI: 10.1016/j.knosys.2022.108151
|View full text |Cite
|
Sign up to set email alerts
|

MDMD options discovery for accelerating exploration in sparse-reward domains

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…We compare the proposed algorithms with Q-Learning (Watkins and Dayan 1992), BC (Simsek and Barreto 2008), SCC (Kazemitabar et al 2018), and MMSC (Setyawan et al 2022). Since the main goal of point option methods (Machado et al 2017;Jinnai et al 2019;Zhu et al 2022) is not making the agent reach the goal faster (in terms of steps or decisions), we argue that these studies are not directly comparable to ours. For SCC and BC, experiments are run with the same RL parameters as those given in Section 5.1.1.…”
Section: Experiments and Resultsmentioning
confidence: 86%
See 2 more Smart Citations
“…We compare the proposed algorithms with Q-Learning (Watkins and Dayan 1992), BC (Simsek and Barreto 2008), SCC (Kazemitabar et al 2018), and MMSC (Setyawan et al 2022). Since the main goal of point option methods (Machado et al 2017;Jinnai et al 2019;Zhu et al 2022) is not making the agent reach the goal faster (in terms of steps or decisions), we argue that these studies are not directly comparable to ours. For SCC and BC, experiments are run with the same RL parameters as those given in Section 5.1.1.…”
Section: Experiments and Resultsmentioning
confidence: 86%
“…This operation is repeated until the given number of options is reached. Zhu et al (2022) propose a new option discovery method, Min Degree and Max Distance (MDMD) options, that accelerate the exploration by reducing the expected cover time of the environment in sparse reward problems. Unlike (Jinnai et al 2019), MDMD does not compute the Laplacian matrix's eigenvector.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This approach enhances exploration and transfer capabilities. Most representatively, the options framework may be the most common formalism that allows agents to reason regarding extended actions [56][57][58][59][60]. This framework models courses of action as options, which can accelerate learning in different ways, allowing, for example, faster credit assignment, planning, transfer learning, and better exploration.…”
Section: Hierarchical Reinforcement Learningmentioning
confidence: 99%
“…Thus, temporal abstraction is the process by which agents learn the temporal structure of tasks in a way that can cut down cognitive load and enhance the generalization ability of jobs with shared structure 40 – 45 . One formal approach to addressing this form of abstraction is the options framework 39 , 46 – 50 . Agents using options seek to learn a set of policies related to different subtasks, along with their initiation and termination conditions.…”
Section: Introductionmentioning
confidence: 99%