Actor-only Deterministic Policy Gradient via Zeroth-order Gradient Oracles in Action Space

Kumar, Harshat; Kalogerias, Dionysios S.; Pappas, George J.; Ribeiro, Alejandro

doi:10.1109/isit45174.2021.9518023

Cited by 2 publications

(2 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[2,7,22]), or such a computation might be prohibitively expensive or noisy (e.g. see [1,27,32]). Thus, several zeroth-order schemes have been developed for the solution of stochastic optimization problems similar to (P), which require only function evaluations of F (•, ξ), in the absence of gradient information.…”

Section: Introductionmentioning

confidence: 99%

A Zeroth-order Proximal Stochastic Gradient Method for Weakly Convex Stochastic Optimization

Pougkakiotis¹,

Kalogerias²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper we present and analyze a zeroth-order proximal stochastic gradient method suitable for the minimization of weakly convex stochastic optimization problems. We consider non-smooth and nonlinear stochastic composite problems, for which (sub-)gradient information might be unavailable. The proposed algorithm utilizes the well-known Gaussian smoothing technique, which yields unbiased zeroth-order gradient estimators of a related partially smooth surrogate problem. This allows us to employ a standard proximal stochastic gradient scheme for the approximate solution of the surrogate problem, which is determined by a single smoothing parameter, and without the utilization of first-order information. We provide state-of-the-art convergence rates for the proposed zeroth-order method, utilizing a much simpler analysis that requires less restrictive assumptions, as compared with a double Gaussian smoothing alternative recently analyzed in the literature. The proposed method is numerically compared against its (sub-)gradient-based counterparts to demonstrate its viability on a standard phase retrieval problem. Further, we showcase the usefulness and effectiveness of our method for the unique setting of automated hyper-parameter tuning. In particular, we focus on automatically tuning the parameters of optimization algorithms by minimizing a novel heuristic model. The proposed approach is tested on a proximal alternating direction method of multipliers for the solution of L 1 /L 2 -regularized PDE-constrained optimal control problems, with evident empirical success.

show abstract

Section: Introductionmentioning

confidence: 99%

A Zeroth-order Proximal Stochastic Gradient Method for Weakly Convex Stochastic Optimization

Pougkakiotis¹,

Kalogerias²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Known gradient-based approaches such as SGD train and generalize effectively in reasonable time [1]. In contrast, emerging applications such as convex bandits [2][3][4], black-box learning [5], federated learning [6], reinforcement learning [7,8], learning linear quadratic regulators [9,10], and hyper-parameter tuning [11] stand in need of gradient-free learning algorithms [11][12][13][14] due to an unknown loss/model or impossible gradient evaluation.…”

Section: Introductionmentioning

confidence: 99%

Black-Box Generalization

Nikolakakis¹,

Haddadpour²,

Kalogerias³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We provide the first generalization error analysis for black-box learning through derivativefree optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a d-dimensional model by replacing stochastic gradient directions with stochastic differences of K + 1 perturbed loss evaluations per dataset (example) query. For both unbounded and bounded possibly nonconvex losses, we present the first generalization bounds for the ZoSS algorithm. These bounds coincide with those for SGD, and rather surprisingly are independent of d, K and the batch size m, under appropriate choices of a slightly decreased learning rate. For bounded nonconvex losses and a batch size m = 1, we additionally show that both generalization error and learning rate are independent of d and K, and remain essentially the same as for the SGD, even for two function evaluations. Our results extensively extend and consistently recover established results for SGD in prior work, on both generalization bounds and corresponding learning rates. If additionally m = n, where n is the dataset size, we derive generalization guarantees for full-batch GD as well.

show abstract

Actor-only Deterministic Policy Gradient via Zeroth-order Gradient Oracles in Action Space

Cited by 2 publications

References 16 publications

A Zeroth-order Proximal Stochastic Gradient Method for Weakly Convex Stochastic Optimization

A Zeroth-order Proximal Stochastic Gradient Method for Weakly Convex Stochastic Optimization

Black-Box Generalization

Contact Info

Product

Resources

About