2021
DOI: 10.1609/aaai.v35i8.16828
|View full text |Cite
|
Sign up to set email alerts
|

Deep Radial-Basis Value Functions for Continuous Control

Abstract: A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the maximum action-value with respect to a deep RBVF can be approximated easily and accurately. Moreover, deep RBVFs can r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…"Normalized Advantage Functions" analytically select maximum-valued actions by restricting their action-value function to quadratic polynomials (Gu et al 2016). Perhaps the most similar method to ours is RBF-DQN (Asadi et al 2021), which computes value as the weighted sum over learned centroids, and describes a method to select approximately maximumaction values. The primary difference in our approach compared with past value-function only methods is the focus on sampling, which allows for a variety of action-selection strategies besides simple maximization, as well as use for variance-reducing Monte Carlo techniques.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…"Normalized Advantage Functions" analytically select maximum-valued actions by restricting their action-value function to quadratic polynomials (Gu et al 2016). Perhaps the most similar method to ours is RBF-DQN (Asadi et al 2021), which computes value as the weighted sum over learned centroids, and describes a method to select approximately maximumaction values. The primary difference in our approach compared with past value-function only methods is the focus on sampling, which allows for a variety of action-selection strategies besides simple maximization, as well as use for variance-reducing Monte Carlo techniques.…”
Section: Related Workmentioning
confidence: 99%
“…However, enumeration is impossible when there are large or infinite possible actions, for example when actions are drawn from a continuous vector space. This is the natural description of, for example, robotic locomotion and control; significant effort has therefore gone into alternative approaches for action-selection in these domains (Gu et al 2016;Asadi et al 2021;Lillicrap et al 2015). A common framework for so-called continuous control problems is to train a separate policy network that selects actions according to some criteria of the Q-values (Fujimoto, van Hoof, and Meger 2018;Haarnoja et al 2018a).…”
Section: Introductionmentioning
confidence: 99%