We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and double-logarithmic for median suboptimality. When there is uncertainty in both distance and gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch.
We lower bound the complexity of finding -stationary points (with gradient norm at most ) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least −4 queries to find an stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of −3 queries, establishing the optimality of recently proposed variance reduction techniques.
We prove lower bounds on the complexity of finding -stationary points (points x such that ∇f (x) ≤ ) of smooth, high-dimensional, and potentially non-convex functions f . We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of f at a query point x. We show that for any (potentially randomized) algorithm A, there exists a function f with Lipschitz pth order derivatives such that A requires at least −(p+1)/p queries to find an -stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newton's method, and generalized pth order regularization are worst-case optimal within their natural function classes.
We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.
We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the global minimum to within ε accuracy in O(ε −1 log(1/ε)) steps for large ε and O(log(1/ε)) steps for small ε (compared to a condition number we define), with at most logarithmic dependence on the problem dimension. When we use gradient descent to approximate the Nesterov-Polyak cubic-regularized Newton step, our result implies a rate of convergence to second-order stationary points of general smooth non-convex functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.