Accelerating Reinforcement Learning with Learned Skill Priors

Pertsch, Karl; Lee, Youngwoon; Lim, Joseph J.

doi:10.48550/arxiv.2010.11944

Cited by 20 publications

(49 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While KL-regularized RL has achieved success across various settings [4,7,19,12], recently Tirumala et al [14] proposed a hierarchical extension where policy π and prior π 0 are augmented with latent variables, π(a, z|x, k) = π H (z|x, k)π L (a|z, x) and π 0 (a, z|x) = π H 0 (z|x)π L 0 (a|z, x), where subscripts H and L denote the higher and lower hierarchical levels. This structure encourages the shared low-level policy (π L = π L 0 ) to discover task-agnostic behavioural primitives, whilst the high-level discovers higher-level skills relevant to each task.…”

Section: Hierarchical Kl-regularized Rlmentioning

confidence: 99%

“…Not conditioning on specific environment aspects forces independence and generalisation across them [8]. In the context of hierarchical KL-regularized RL, the explored asymmetries between the high-level policy, π H , and prior, π H 0 , have been narrow [14,19]. Tirumala et al [14], Pertsch et al [19] explore auto-regressive priors of the form:…”

Section: Information Asymmetrymentioning

confidence: 99%

“…Every module is fed all environment information x k = (x, k) and distinctly chosen IGFs mask which part of the input each network has access to, thereby influencing which skills they learn. By presenting multiple priors, we enable a comparison with existing literature [14,19,20,21]. With the right masking, one can recover previously investigated asymmetries [14,19], explore additional ones, and also express purely hierarchical [9] and KL-regularized equivalents [8].…”

Section: Information Asymmetrymentioning

confidence: 99%

“…By presenting multiple priors, we enable a comparison with existing literature [14,19,20,21]. With the right masking, one can recover previously investigated asymmetries [14,19], explore additional ones, and also express purely hierarchical [9] and KL-regularized equivalents [8].…”

Section: Information Asymmetrymentioning

confidence: 99%

“…In the context of offline-RL, Siegel et al [30], Wu et al [31] primarily use priors to tackle value overestimation [32]. In the variational literature, priors have been used to guide latent-space learning [11,20,19,33]. Hausman et al [11] learn episodic skills, limiting their ability to transfer.…”

Section: Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Salter¹,

Hartikainen²,

Goodwin³

et al. 2022

Preprint

View full text Add to dashboard Cite

The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized RL individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry to bias which skills are learnt. While asymmetric choice has a large influence on transferability, prior works have explored a narrow range of asymmetries, primarily motivated by intuition. In this paper, we theoretically and empirically show the crucial trade-off, controlled by information asymmetry, between the expressivity and transferability of skills across sequential tasks. Given this insight, we provide a principled approach towards choosing asymmetry and apply our approach to a complex, robotic block stacking domain, unsolvable by baselines, demonstrating the effectiveness of hierarchical KL-regularized RL, coupled with correct asymmetric choice, for sample-efficient transfer learning.

show abstract

Section: Hierarchical Kl-regularized Rlmentioning

confidence: 99%

Section: Information Asymmetrymentioning

confidence: 99%

Section: Information Asymmetrymentioning

confidence: 99%

Section: Information Asymmetrymentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Salter¹,

Hartikainen²,

Goodwin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Robot Target Location Based on the Difference in Monocular Vision Projection

Zheng

Deng

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Visual guidance is widely used in industrial robots. At present, the traditional method of template matching and positioning is often used in industrial robots under monocular vision guidance. However, for complex workpieces with height differences, if the angle and position of the workpiece are not consistent with the template, the projection will be different, and the positioning accuracy will be obviously reduced. A new method is proposed to solve this problem by dividing the whole contour of the workpiece and using the weighted method of the contour module to correct and match the workpiece image. First, the contour region of the template image is divided according to the position of the grasping points. Then the weight is assigned according to the distance from each contour region to the grasping points. Then, a fast feature point matching method is used to match the initial image of the workpiece to be measured. Finally, accurate contour matching is carried out for each weighted contour region so that the robot can grasp the object accurately. A large number of experiments show that the method has the design requirements of high stability and high accuracy.

show abstract

Hierarchical Policies for Cluttered-Scene Grasping With Latent Plans

Wang

Meng

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

6D grasping in cluttered scenes is a longstanding robotic manipulation problem. Open-loop manipulation pipelines can fail due to modularity and error sensitivity while most end-to-end grasping policies with raw perception inputs have not yet scaled to complex scenes with obstacles. In this work, we propose a new method to close the gap through sampling and selecting plans in the latent space. Our hierarchical framework learns collision-free target-driven grasping based on partial point cloud observations. Our method learns an embedding space to represent expert grasping plans and a variational autoencoder to sample diverse latent plans at inference time. Furthermore, we train a latent plan critic for plan selection and an option classifier for switching to an instance grasping policy through hierarchical reinforcement learning. We evaluate and analyze our method and compare against several baselines in simulation, and demonstrate that the latent planning can generalize to the real-world cluttered-scene grasping task. 1

show abstract

Accelerating Reinforcement Learning with Learned Skill Priors

Cited by 20 publications

References 0 publications

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Robot Target Location Based on the Difference in Monocular Vision Projection

Hierarchical Policies for Cluttered-Scene Grasping With Latent Plans

Contact Info

Product

Resources

About