2018
DOI: 10.48550/arxiv.1806.02315
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Randomized Value Functions via Multiplicative Normalizing Flows

Abstract: Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent's behavior policy from prematurely exploiting early estimates and falling into local optima. In this work, we leverage recent advances in variational Bayesian neural networks and combine th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…We briefly address prior results in the literature where BDQN is seen solving problems seemingly similar to our binary tree MDP with ease (as in, for example, figure 1 of Touati et al, 2018).…”
Section: On the Success Of Bdqn In Environments With Tied Actionsmentioning
confidence: 99%
See 1 more Smart Citation
“…We briefly address prior results in the literature where BDQN is seen solving problems seemingly similar to our binary tree MDP with ease (as in, for example, figure 1 of Touati et al, 2018).…”
Section: On the Success Of Bdqn In Environments With Tied Actionsmentioning
confidence: 99%
“…The discrepancy occurs because previous work often does not randomise the effects of actions (for example Plappert et al, 2018;Touati et al, 2018), i.e. if a 1 leads UP in any state s k , then a 1 leads UP in all states.…”
Section: On the Success Of Bdqn In Environments With Tied Actionsmentioning
confidence: 99%
“…With the recent developments in the NFs architectures [8][9][10], the qualitative and quantitative results as well as computational requirements are extremely competitive. The applications include but are not limited to image generation [8,11], noise modelling [12], video generation [13], audio generation [14][15][16], graph generation [17], reinforcement learning [18][19][20], computer graphics [21] and physics [22][23][24][25][26].…”
Section: Introductionmentioning
confidence: 99%
“…It is inspired by work on posterior sampling for reinforcement learning (a.k.a Thompson sampling) [19,26], which could be interpreted as sampling a value function from a posterior distribution and following the optimal policy under that value function for some extended period of time before resampling. A number of papers have subsequently investigated approaches that generate randomized value functions in complex reinforcement learning problems [6,9,12,20,23,27,28]. Our theory will focus on a specific approach of [21,22], dubbed randomized least squares value iteration (RLSVI), as specialized to tabular MDPs.…”
Section: Introductionmentioning
confidence: 99%
“…The issue is that posterior sampling based approaches are derived from a true Bayesian perspective in which one maintains beliefs over the underlying MDP. The approaches of [6,9,12,22,23,27,28] model only the value function, so Bayes rule is not even well defined. 1 The work of [21,22] uses stochastic dominance arguments to relate the value function sampling distribution of RLSVI to a correct posterior in a Bayesian model where the true MDP is randomly drawn.…”
Section: Introductionmentioning
confidence: 99%