Deep Radial-Basis Value Functions for Continuous Control

Asadi, Kavosh; Parikh, Neev; Parr, Ronald; Konidaris, George; Littman, Michael L.

doi:10.1609/aaai.v35i8.16828

Cited by 7 publications

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…"Normalized Advantage Functions" analytically select maximum-valued actions by restricting their action-value function to quadratic polynomials (Gu et al 2016). Perhaps the most similar method to ours is RBF-DQN (Asadi et al 2021), which computes value as the weighted sum over learned centroids, and describes a method to select approximately maximumaction values. The primary difference in our approach compared with past value-function only methods is the focus on sampling, which allows for a variety of action-selection strategies besides simple maximization, as well as use for variance-reducing Monte Carlo techniques.…”

Section: Related Workmentioning

confidence: 99%

“…However, enumeration is impossible when there are large or infinite possible actions, for example when actions are drawn from a continuous vector space. This is the natural description of, for example, robotic locomotion and control; significant effort has therefore gone into alternative approaches for action-selection in these domains (Gu et al 2016;Asadi et al 2021;Lillicrap et al 2015). A common framework for so-called continuous control problems is to train a separate policy network that selects actions according to some criteria of the Q-values (Fujimoto, van Hoof, and Meger 2018;Haarnoja et al 2018a).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Q-functionals for Value-Based Continuous Control

Lobel,

Rammohan,

et al. 2023

AAAI

View full text Add to dashboard Cite

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Q-functionals for Value-Based Continuous Control

Lobel,

Rammohan,

et al. 2023

AAAI

View full text Add to dashboard Cite

show abstract

Visual Radial Basis Q-Network

Hautot

Teulière

Azzaoui

2022

Pattern Recognition and Artificial Intelligence

View full text Add to dashboard Cite

While reinforcement learning (RL) from raw images has been largely investigated in the last decade, existing approaches still suffer from a number of constraints. The high input dimension is often handled using either expert knowledge to extract handcrafted features or environment encoding through convolutional networks. Both solutions require numerous parameters to be optimized. In contrast, we propose a generic method to extract sparse features from raw images with few trainable parameters. We achieved this using a Radial Basis Function Network (RBFN) directly on raw image. We evaluate the performance of the proposed approach for visual extraction in Q-learning tasks in the Vizdoom environment. Then, we compare our results with two Deep Q-Network, one trained directly on images and another one trained on feature extracted by a pretrained auto-encoder. We show that the proposed approach provides similar or, in some cases, even better performances with fewer trainable parameters while being conceptually simpler.

show abstract