2018
DOI: 10.48550/arxiv.1805.04874
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GAN Q-learning

Abstract: Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Q-learning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empiri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 8 publications
(10 reference statements)
0
7
0
Order By: Relevance
“…Concurrently, and independently from us, Doan et al (2018) showed a similar equivalence between the distributional Bellman equation and GANs, and used it to develop a GAN Q-learning algorithm. Compared to that work, which did not show any significant improvement of GAN Q-learning over conventional DiRL methods, we show that the GAN approach can be used to tackle multivariate rewards, and use it to develop a novel exploration strategy.…”
Section: Related Workmentioning
confidence: 91%
“…Concurrently, and independently from us, Doan et al (2018) showed a similar equivalence between the distributional Bellman equation and GANs, and used it to develop a GAN Q-learning algorithm. Compared to that work, which did not show any significant improvement of GAN Q-learning over conventional DiRL methods, we show that the GAN approach can be used to tackle multivariate rewards, and use it to develop a novel exploration strategy.…”
Section: Related Workmentioning
confidence: 91%
“…Traditional fusion of RL and GAN method [9] pays more attention to improving the efficiency of imitation rather than preserving the useful information as described in this paper. In brief, we propose a novel perspective for the application of GAN, which also yields a reliable and effective result.…”
Section: Introductionmentioning
confidence: 99%
“…Bellemare, Dabney, and Munos (2017) use a categorical distribution to keep track of the random returns to bolster exploratory actions. In a similar vein, Doan, Mazoure, and Lyle (2018) rely on a generative model to learn the distribution of state-action values. In that case, approximating the return density with a generator allows to Preprint.…”
Section: Introductionmentioning
confidence: 99%
“…Osband et al (2016) rather rely on an ensemble of neural networks to estimate the uncertainty in the prediction of the value function, allowing to reduce learning times while improving performance. Finally, Doan, Mazoure, and Lyle (2018) consider generative adversarial networks (Goodfellow et al, 2014) to model the distribution of random state-value functions. The current work considers a different approach based on normalizing flows for density estimation.…”
Section: Introductionmentioning
confidence: 99%