2021
DOI: 10.48550/arxiv.2106.01404
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

Abstract: Learning to reach goal states and learning diverse skills through mutual information (MI) maximization have been proposed as principled frameworks for self-supervised reinforcement learning, allowing agents to acquire broadly applicable multitask policies with minimal reward engineering. Starting from a simple observation that the standard goal-conditioned RL (GCRL) is encapsulated by the optimization objective of variational empowerment, we discuss how GCRL and MIbased RL can be generalized into a single fami… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…This paper focuses on how to generate diverse subgoals and how to make the agent learn various subgoals. However, there are a lot of issues to be addressed in the RL as follows [69] :…”
Section: Future Researchmentioning
confidence: 99%
“…This paper focuses on how to generate diverse subgoals and how to make the agent learn various subgoals. However, there are a lot of issues to be addressed in the RL as follows [69] :…”
Section: Future Researchmentioning
confidence: 99%
“…Our work fits best into the unsupervised skill discovery literature [35,27,20,2,19,46,15]. Compared to DIAYN [19] and similar approaches, we do not set the number of skills to be learnt at the beginning of training.…”
Section: Related Workmentioning
confidence: 99%
“…Another approach to train multiple behaviours is goal-conditioned learning [29,44,4,43,36,54,58,39,15]. In automated curriculum learning [7,21,23,26,52,22,32,37,40,41,59], a sequence of goals is created such that each of them is not too hard nor too easy for the current agent.…”
Section: Related Workmentioning
confidence: 99%
“…Divergent from these works, we are interested in improving the robustness of the policy learned by GAIL. Recently [22] show that spectral clustering introduced in [19] improves the representation learning capabilities of generative models as it pertains to latent goal discovery in the context of goal-based RL. The Loss-Sensitive Generative Adversarial Network (LS-GAN) [23] induces a Lipschitz regularity condition on the density of real data, i.e., the space of distributions the GAN learns from, which leads to a regularized model which can generate more realistic samples than ordinary GANs.…”
Section: Related Workmentioning
confidence: 99%
“…Essentially, inequality ( 17) and it's equivalent form (21), are a version of (22) where the transition dynamics are stochastic rather than deterministic. In light of the insights from (22), inequalities (17) and (21) will hold if the stochastic dynamics of the MDP and the corresponding stochastic dynamics of the optimal Markov chain have a property that resembles Lipschitzness of a deterministic function.…”
Section: Lipschitzness Of Optimal Q-function For One-dimensional Stat...mentioning
confidence: 99%