We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new Zeroth-Order Regularized Optimization method, dubbed ZORO. When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function. We achieve this with an adaptive, randomized gradient estimator, followed by an inexact proximal-gradient scheme. Under a novel approximately sparse gradient assumption and various different convex settings, we show the (theoretical and empirical) convergence rate of ZORO is only logarithmically dependent on the problem dimension. Numerical experiments show that ZORO outperforms the existing methods with similar assumptions, on both synthetic and real datasets.
A promising trend in deep learning replaces traditional feedforward networks with implicit networks. Unlike traditional networks, implicit networks solve a fixed point equation to compute inferences. Solving for the fixed point varies in complexity, depending on provided data and an error tolerance. Importantly, implicit networks may be trained with fixed memory costs in stark contrast to feedforward networks, whose memory requirements scale linearly with depth. However, there is no free
lunch --- backpropagation through implicit networks often requires solving a costly Jacobian-based equation arising from the implicit function theorem. We propose Jacobian-Free Backpropagation (JFB), a fixed-memory approach that circumvents the need to solve Jacobian-based equations. JFB makes implicit networks faster to train and significantly easier to implement, without sacrificing test accuracy. Our experiments show implicit networks trained with JFB are competitive with feedforward networks and prior implicit networks given the same number of parameters.
We study the use of power weighted shortest path metrics for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.1. We prove that p-wspm's behave as expected for data satisfying the manifold hypothesis.That is, we show that the maximum distance between points in the same cluster is small with high probability, and tends to zero as the number of data points tends to infinity. On the other hand, the maximum distance between points in different clusters remains bounded away from zero.2. We show how p-wspm's can be thought of as interpolants between the Euclidean metric and the longest leg path distance (defined in §2.3), which we shall abbreviate to LLPD.3. We introduce a novel modified version of Dijkstra's algorithm that computes the k nearest neighbors, with respect to any p-wspm or the LLPD, of any x α in X in O(k 2 T Enn ) time, where T Enn is the cost of a Euclidean nearest-neighbor query. Hence one can construct a p-wspm k-NN graph in O(nk 2 T Enn ). As we typically have k n, i.e. k = O(log(n)) or even k = O(1), this means that constructing a p-wspm k-NN graph requires only marginally more time than constructing a Euclidean k-NN graph (which requires O(nkT Enn )).4. We verify experimentally that using a p-wspm in lieu of the Euclidean metric results in an appreciable increase in clustering accuracy, at the cost of a small increase in run time, for a wide range of real and synthetic data sets.After establishing notation and surveying the literature in §2, we prove our main results in §3 and §4. In §5 we present our algorithm for computing k nearest neighbors in any p-wspm, while in §6 we report the results of our numerical experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.