2021
DOI: 10.48550/arxiv.2107.12416
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Abstract: Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 38 publications
0
1
0
Order By: Relevance
“…where the stochastic component F (x; ξ), indexed by random variable ξ, is possibly nonconvex and nonsmooth. We focus on tackling the problem with Lipschitz continuous objective, which arises in many popular applications including simulation optimization [17,34], deep neural networks [4,15,33,48], statistical learning [11,31,49,50,52], reinforcement learning [5,21,30,41], financial risk minimization [40] and supply chain management [10]. The Clarke subdifferential [6] for Lipschitz continuous function is a natural extension of gradient for smooth function and subdifferential for convex function.…”
Section: Introductionmentioning
confidence: 99%
“…where the stochastic component F (x; ξ), indexed by random variable ξ, is possibly nonconvex and nonsmooth. We focus on tackling the problem with Lipschitz continuous objective, which arises in many popular applications including simulation optimization [17,34], deep neural networks [4,15,33,48], statistical learning [11,31,49,50,52], reinforcement learning [5,21,30,41], financial risk minimization [40] and supply chain management [10]. The Clarke subdifferential [6] for Lipschitz continuous function is a natural extension of gradient for smooth function and subdifferential for convex function.…”
Section: Introductionmentioning
confidence: 99%