2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019
DOI: 10.1109/iros40897.2019.8968149
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

Abstract: A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Moreover, such… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…Generally, the original MRL supports not only the decomposition of complex tasks into modules, but also the composability of separately learned modules as new strategies for tasks that were never solved before [4,13]. Focusing on the optimality of the composite strategy for the entire task and the independence of learning in separate modules, [12] introduced the specific concept of "modular reward", which comes from the actual reward after each interaction plus a bonus for passing the task on a proper module.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Generally, the original MRL supports not only the decomposition of complex tasks into modules, but also the composability of separately learned modules as new strategies for tasks that were never solved before [4,13]. Focusing on the optimality of the composite strategy for the entire task and the independence of learning in separate modules, [12] introduced the specific concept of "modular reward", which comes from the actual reward after each interaction plus a bonus for passing the task on a proper module.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, this bonus is calculated from the modular value function and the temporal difference in the module gating signal, which propagates the reward toward the entire task achievement between modules. In situations where the tasks require to perform the sub-tasks concurrently, [4] propose an hierarchical RL approach to learn both compound and composable policies within the same learning process, by which exploiting the off-policy data generated by the compound policy. The results show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding sub-tasks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation