We consider minimizing f (x) = E[f (x, ω)] when f (x, ω) is possibly nonsmooth and either strongly convex or convex in x. (I) Strongly convex. When f (x, ω) is µ−strongly convex in x, traditional stochastic approximation (SA) schemes often display poor behavior, arising in part from noisy subgradients and diminishing steplengths. Instead, we propose a variable sample-size accelerated proximal scheme (VS-APM) and apply it on f η (x), the (η-)Moreau smoothed variant of E[f (x, ω)]; we term such a scheme as (η-VS-APM). In contrast with SA schemes, (η-VS-APM) utilizes constant steplengths and increasingly exact gradients, achieving an optimal oracle complexity in stochastic subgradients of O(1/ ) with an iteration complexity of O( (ηµ + 1)/(ηµ) log(1/ )) in inexact (outer) gradients of f η (x), computed via an increasing number of inner stochastic subgradient steps. This approach is also beneficial for ill-conditioned L-smooth problems where L/µ is massive, resulting in better conditioned outer problems and allowing for larger steps and better numerical behavior. This framework is characterized by an optimal oracle complexity of O( L/µ + 1/(ηµ) log(1/ )) and an overall iteration complexity of O(log 2 (1/ )) in gradient steps. (II) Convex. When f (x, ω) is merely convex but smoothable, by suitable choices of the smoothing, steplength, and batch-size sequences, smoothed (VS-APM) (or sVS-APM) produces sequences for which expected sub-optimality diminishes at the rate of O(1/k) with an optimal oracle complexity of O(1/ 2 ). Our results can be specialized to two important cases: (a) Smooth f . Since smoothing is no longer required, we observe that (VS-APM) admits the optimal rate and oracle complexity, matching prior findings; (b) Deterministic nonsmooth f . In the nonsmooth deterministic regime, (sVS-APM) reduces to a smoothed accelerated proximal method (s-APM) that is both asymptotically convergent, admitting a non-asymptotic rate of O(1/k), matching that by [23] for producing approximate solutions. Finally, (sVS-APM) and (VS-APM) produce sequences that converge almost surely to a solution of the original problem.
Classical theory for quasi-Newton schemes has focused on smooth deterministic unconstrained optimization while recent forays into stochastic convex optimization have largely resided in smooth, unconstrained, and strongly convex regimes. Naturally, there is a compelling need to address nonsmoothness, the lack of strong convexity, and the presence of constraints. Accordingly, this paper presents a quasi-Newton framework that can process merely convex and possibly nonsmooth (but smoothable) stochastic convex problems. We propose a framework that combines iterative smoothing and regularization with a variance-reduced scheme reliant on using an increasing sample-size of gradients. We make the following contributions. (i) We develop a regularized and smoothed variable sample-size BFGS update (rsL-BFGS) that generates a sequence of Hessian approximations and can accommodate nonsmooth convex objectives by utilizing iterative regularization and smoothing. (ii) In strongly convex regimes with statedependent noise, the proposed variable sample-size stochastic quasi-Newton (VS-SQN) scheme admits a non-asymptotic linear rate of convergence while the oracle complexity of computing an -solution is O(κ m+1 / ) where κ denotes the condition number and m ≥ 1. In nonsmooth (but smoothable) regimes, using Moreau smoothing retains the linear convergence rate while using more general smoothing leads to a deterioration of the rate to O(k −1/3 ) for the resulting smoothed VS-SQN (or sVS-SQN) scheme. Notably, the nonsmooth regime allows for accommodating convex constraints; (iii) In merely convex but smooth settings, the regularized VS-SQN scheme rVS-SQN displays a rate of O(1/k (1−ε) ) with an oracle complexity of O(1/ 3 ). When the smoothness requirements are weakened, the rate for the regularized and smoothed VS-SQN scheme rsVS-SQN worsens to O(k −1/3 ). Such statements allow for a state-dependent noise assumption under a quadratic growth property on the objective. To the best of our knowledge, the rate results are amongst the first available rates in nonsmooth regimes. Preliminary numerical evidence suggests that the schemes compare well with accelerated gradient counterparts on selected problems in stochastic optimization and machine learning with significant benefits in ill-conditioned regimes.
We consider a stochastic variational inequality (SVI) problem with a continuous and monotone mapping over a closed and convex set. In strongly monotone regimes, we present a variable sample-size averaging scheme (VS-Ave) that achieves a linear rate with an optimal oracle complexity. In addition, the iteration complexity is shown to display a muted dependence on the condition number compared with standard variance-reduced projection schemes. To contend with merely monotone maps, we develop amongst the first proximal-point algorithms with variable sample-sizes (PPAWSS), where increasingly accurate solutions of strongly monotone SVIs are obtained via (VS-Ave) at every step. This allows for achieving a sublinear convergence rate that matches that obtained for deterministic monotone VIs. Preliminary numerical evidence suggests that the schemes compares well with competing schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.