2019
DOI: 10.1609/aaai.v33i01.33014504
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Abstract: Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(32 citation statements)
references
References 2 publications
0
32
0
Order By: Relevance
“…In distributional RL, the RPE is expanded to a vector, with different elements signaling RPE signals based on different a priori forecasts, ranging from highly optimistic to highly pessimistic predictions (Figure 4a,b). This modification had been observed in AI work to dramatically enhance both the pace and outcome of RL across a variety of tasks, something -importantly -which is observed in deep RL, but not simpler forms such as tabular or linear RL (due in part to the impact of distributional coding on representation learning (Lyle et al, 2019)). Carrying this finding into the domain of neuroscience, Dabney and colleagues studied electrophysiological data from mice to test whether the dopamine system might employ the kind of vector code involved in distributional RL.…”
Section: Vanguard Studiesmentioning
confidence: 99%
“…In distributional RL, the RPE is expanded to a vector, with different elements signaling RPE signals based on different a priori forecasts, ranging from highly optimistic to highly pessimistic predictions (Figure 4a,b). This modification had been observed in AI work to dramatically enhance both the pace and outcome of RL across a variety of tasks, something -importantly -which is observed in deep RL, but not simpler forms such as tabular or linear RL (due in part to the impact of distributional coding on representation learning (Lyle et al, 2019)). Carrying this finding into the domain of neuroscience, Dabney and colleagues studied electrophysiological data from mice to test whether the dopamine system might employ the kind of vector code involved in distributional RL.…”
Section: Vanguard Studiesmentioning
confidence: 99%
“…Here, T π is, for all n ≥ 1, a γ-contraction in D n with a unique fixed point when D n is endowed with the supremum n th -Wasserstein metric ( [5], Lemma 3) (see [15] for more details on Wasserstein distances). Moreover by Proposition 2 of [9], T π is expectation preserving when we have an initial coupling with the T π -iteration given in (2); that is, given an initial η 0 ∈ D and a function g, such that g = Q η 0 . Then, (T π ) n g = Q (T π ) n η 0 holds for all n ≥ 0.…”
Section: Discussionmentioning
confidence: 99%
“…This in turn paints a complex and more informationally dense picture, and there exists overwhelming empirical evidence that the distributional perspective is helpful in deep reinforcement learning. That is, apart from the possibility of overall stronger performance, algorithmic benefits may also involve the reduction of prediction variance, more robust learning with additional regularization effects, and a larger set of auxiliary goals such as learning risk-sensitive policies [5][6][7][8][9]. Moreover, there have recently been important theoretical works done on understanding the observed improvements and providing theoretical results on convergence [5,[9][10][11].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, the value and RPE signals from standard TD learning are replaced by richer, multi-dimensional representations capturing the full probability distribution over rewards and prediction-error signals (Dabney et al, 2018). Elaborating TD learning in this way enhances learning in deep RL systems by driving the emergence of richer internal representations, an effect not seen when the same approach is applied to RL systems that do not involve deep learning (Lyle, Bellemare & Castro, 2019). In recent experimental neuroscience work, evidence has been obtained that the mammalian dopamine system may use a distributional representation, as in distributional RL (Dabney et al, 2020).…”
Section: Deep Rl: Implications For Psychologymentioning
confidence: 99%