2020
DOI: 10.1007/s10458-020-09451-0
|View full text |Cite
|
Sign up to set email alerts
|

Model primitives for hierarchical lifelong reinforcement learning

Abstract: Learning interpretable and transferable subpolicies and performing task decomposition from a single, complex task is difficult. Some traditional hierarchical reinforcement learning techniques enforce this decomposition in a top-down manner, while meta-learning techniques require a task distribution at hand to learn such decompositions. This paper presents a framework for using diverse suboptimal world models to decompose complex task solutions into simpler modular subpolicies. This framework performs automatic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(9 citation statements)
references
References 34 publications
0
9
0
Order By: Relevance
“…Modularity Modular approaches in HRL have been relevantly applied to decompose a possibly complex task domain into different regions of specialization, but also to multitask DRL, in order to reuse previously learned skills across tasks [107][108][109]. Tasks may be annotated by handcrafted instructions (as "policy sketches" in [107]), and symbolic subtasks may be associated with subpolicies which a full task-specific policy aims to successfully combine.…”
Section: Hierarchical Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Modularity Modular approaches in HRL have been relevantly applied to decompose a possibly complex task domain into different regions of specialization, but also to multitask DRL, in order to reuse previously learned skills across tasks [107][108][109]. Tasks may be annotated by handcrafted instructions (as "policy sketches" in [107]), and symbolic subtasks may be associated with subpolicies which a full task-specific policy aims to successfully combine.…”
Section: Hierarchical Approachesmentioning
confidence: 99%
“…Distinctively, [108] proposes to train-or provide as a prior-a stochastic temporal grammar (STG), in order to capture temporal transitions between the tasks; a STG encodes priorities of sub-tasks over others and enables to better learn how to switch between base or augmented subpolicies. The work in [109] seeks to learn a mixture of subpolicies by modeling the environment assuming given a set of (imperfect) models specialized by regions, referred to as model primitives. Each policy is specialized on those regions and the weights in the mixture correspond to the posterior probability of a model given the current state.…”
Section: Hierarchical Approachesmentioning
confidence: 99%
“…goal space, while making RL in the lower level easier with an explicit and short-horizon goal. Recent works extended hierarchical RL to solve complex tasks [42,43,44,45,46]. Le et al [47] proposed a variant of hierarchical RL, which employs IL to learn the high-level policy to leverage expert feedback to explore the goal space more efficiently.…”
Section: Related Workmentioning
confidence: 99%
“…The L3 level learning brings the idea of motion primitives from robotics towards generalized tracking behavior. There are many known approaches to primitive-based tracking control, both older and more recent [ 48 , 49 , 50 , 51 , 52 ]. Their taxonomy is not studied here because it has been reviewed elsewhere, e.g., in [ 1 , 2 , 3 ].…”
Section: Introductionmentioning
confidence: 99%