Model Approximation for HEXQ Hierarchical Reinforcement Learning

Hengst, Bernhard

doi:10.1007/978-3-540-30115-8_16

Cited by 12 publications

(14 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These problems have been widely used in the literature to evaluate the performance of cooperative Q-learning algorithms [12]- [13], [24]- [26].…”

Section: Methodsmentioning

confidence: 99%

“…RL can be applied to two types of learning problems [24]. First, single-task problems (e.g., shortest path problem), in which the learner is required to learn a single task.…”

Section: Test Problemsmentioning

confidence: 99%

“…In this section, the performance of BQ-learning was compared with the performance of singleagent Q-learning, AVE-Q, BEST-Q, PSO-Q, WSS and average-aggregation Q-learning (Section 3) using two problems: the shortest path problem [12] and the taxi problem [24]. These problems have been widely used in the literature to evaluate the performance of cooperative Q-learning algorithms [12]- [13], [24]- [26].…”

Section: Methodsmentioning

confidence: 99%

“…Filled squares represent obstacles that the agent cannot pass, 0 s is the start cell and g s is the target cell. The taxi domain problem is an episodic multi-task problem that has been used in many research studies to evaluate the performance of hierarchical Q-learning algorithms [24]- [26]. In each episode, a taxi agent in a grid world of size 5 5 is required to perform multiple tasks: finding a customer, picking up the customer, driving the customer to a destination location and dropping down the customer in the destination location.…”

Section: Test Problemsmentioning

confidence: 99%

See 3 more Smart Citations

Bat Q-learning Algorithm

Abed-alguni¹

2017

JJCIT

View full text Add to dashboard Cite

show abstract

“…These problems have been widely used in the literature to evaluate the performance of cooperative Q-learning algorithms [12]- [13], [24]- [26].…”

Section: Methodsmentioning

confidence: 99%

“…RL can be applied to two types of learning problems [24]. First, single-task problems (e.g., shortest path problem), in which the learner is required to learn a single task.…”

Section: Test Problemsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Test Problemsmentioning

confidence: 99%

See 2 more Smart Citations

Bat Q-learning Algorithm

Abed-alguni¹

2017

JJCIT

View full text Add to dashboard Cite

show abstract

“…The literature [29] contains two different ways to tackle with UTS in RL: (a) using null state transition and/or generating negative rewards [9,23,6], and (b) manually discarding actions of the action repertoire that could lead to an undesirable state [26], i.e. defining state dependent action repertoires A s ð Þ.…”

Section: Constrained Mdpsmentioning

confidence: 99%