2020
DOI: 10.48550/arxiv.2010.11944
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accelerating Reinforcement Learning with Learned Skill Priors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
48
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(49 citation statements)
references
References 0 publications
1
48
0
Order By: Relevance
“…While KL-regularized RL has achieved success across various settings [4,7,19,12], recently Tirumala et al [14] proposed a hierarchical extension where policy π and prior π 0 are augmented with latent variables, π(a, z|x, k) = π H (z|x, k)π L (a|z, x) and π 0 (a, z|x) = π H 0 (z|x)π L 0 (a|z, x), where subscripts H and L denote the higher and lower hierarchical levels. This structure encourages the shared low-level policy (π L = π L 0 ) to discover task-agnostic behavioural primitives, whilst the high-level discovers higher-level skills relevant to each task.…”
Section: Hierarchical Kl-regularized Rlmentioning
confidence: 99%
See 4 more Smart Citations
“…While KL-regularized RL has achieved success across various settings [4,7,19,12], recently Tirumala et al [14] proposed a hierarchical extension where policy π and prior π 0 are augmented with latent variables, π(a, z|x, k) = π H (z|x, k)π L (a|z, x) and π 0 (a, z|x) = π H 0 (z|x)π L 0 (a|z, x), where subscripts H and L denote the higher and lower hierarchical levels. This structure encourages the shared low-level policy (π L = π L 0 ) to discover task-agnostic behavioural primitives, whilst the high-level discovers higher-level skills relevant to each task.…”
Section: Hierarchical Kl-regularized Rlmentioning
confidence: 99%
“…Not conditioning on specific environment aspects forces independence and generalisation across them [8]. In the context of hierarchical KL-regularized RL, the explored asymmetries between the high-level policy, π H , and prior, π H 0 , have been narrow [14,19]. Tirumala et al [14], Pertsch et al [19] explore auto-regressive priors of the form:…”
Section: Information Asymmetrymentioning
confidence: 99%
See 3 more Smart Citations