Possibility Before Utility: Learning And Using Hierarchical Affordances

Costales, Robby; Iqbal, Shariq; Sha, Fei

doi:10.48550/arxiv.2203.12686

Cited by 1 publication

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The purpose of this experiment is to validate the effectiveness of our algorithm in addressing the problem of decision difficulty caused by skill redundancy in long-sequence composite tasks. To achieve this, we plan to compare our algorithm with two prominent methods in hierarchical reinforcement learning: the classic Option-Critic algorithm and the HAL algorithm 16 , which has shown good performance in sparse reward tasks.…”

Section: Comparison Experimentsmentioning

confidence: 99%

Reducing redundancy of skills in hierarchical reinforcement learning with offline data

Zhou,

Wang,

Wang

et al. 2024

Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024)

View full text Add to dashboard Cite

In highly sparse reward composite tasks, agents often face a lack of reward feedback within fixed time steps, leading to getting trapped in local optima and compromising their ability to effectively explore superior strategies. Skill learning is one approach to increase the density of reward signals, enabling adaptation to multi-stage tasks and expediting the learning process. However, contemporary methods for skill acquisition heavily rely on online asynchronous training. Although certain intrinsic motivation approaches excel at addressing sparse reward challenges, they suffer from issues of low sampling efficiency and limited interpretability of skills. These challenges hinder the speed of model learning and severely impede the reusability of skill policies. In this study, we employ expert demonstration data to facilitate the learning of skill policies, aiming to accelerate the convergence of the model while increasing the utilization of sample data. Subsequently, we engage in interactive learning with the environment. Additionally, we define an evaluation criterion for skill redundancy to encourage the selection of the most cost-effective skill strategy among similar skill policies that manifest between initial and final states. This process helps the agent efficiently and effectively accomplish complex tasks. Our objective is to minimize ineffective and redundant exploration and learning during the skill acquisition process. We evaluate our approach in the simulated UGV-Pyramid and simulated UGV-Hallway tasks, both implemented using Unity3D modeling. The results demonstrate the superiority of our algorithm compared to previous skill learning methods.

show abstract

Section: Comparison Experimentsmentioning

confidence: 99%