Daniel McKenzie scite author profile

We consider the problem of minimizing a high-dimensional objective function, which may include a regularization term, using (possibly noisy) evaluations of the function. Such optimization is also called derivative-free, zeroth-order, or black-box optimization. We propose a new Zeroth-Order Regularized Optimization method, dubbed ZORO. When the underlying gradient is approximately sparse at an iterate, ZORO needs very few objective function evaluations to obtain a new iterate that decreases the objective function. We achieve this with an adaptive, randomized gradient estimator, followed by an inexact proximal-gradient scheme. Under a novel approximately sparse gradient assumption and various different convex settings, we show the (theoretical and empirical) convergence rate of ZORO is only logarithmically dependent on the problem dimension. Numerical experiments show that ZORO outperforms the existing methods with similar assumptions, on both synthetic and real datasets.

show abstract

Compressive Sensing for Cut Improvement and Local Clustering

Lai¹,

McKenzie²

2020

SIAM Journal on Mathematics of Data Science

View full text Add to dashboard Cite

JFB: Jacobian-Free Backpropagation for Implicit Networks

Fung

Heaton²,

et al. 2022

AAAI

View full text Add to dashboard Cite

A promising trend in deep learning replaces traditional feedforward networks with implicit networks. Unlike traditional networks, implicit networks solve a fixed point equation to compute inferences. Solving for the fixed point varies in complexity, depending on provided data and an error tolerance. Importantly, implicit networks may be trained with fixed memory costs in stark contrast to feedforward networks, whose memory requirements scale linearly with depth. However, there is no free lunch --- backpropagation through implicit networks often requires solving a costly Jacobian-based equation arising from the implicit function theorem. We propose Jacobian-Free Backpropagation (JFB), a fixed-memory approach that circumvents the need to solve Jacobian-based equations. JFB makes implicit networks faster to train and significantly easier to implement, without sacrificing test accuracy. Our experiments show implicit networks trained with JFB are competitive with feedforward networks and prior implicit networks given the same number of parameters.

show abstract

Power weighted shortest paths for clustering Euclidean data

McKenzie¹,

Damelin²

2019

View full text Add to dashboard Cite

We study the use of power weighted shortest path metrics for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.1. We prove that p-wspm's behave as expected for data satisfying the manifold hypothesis.That is, we show that the maximum distance between points in the same cluster is small with high probability, and tends to zero as the number of data points tends to infinity. On the other hand, the maximum distance between points in different clusters remains bounded away from zero.2. We show how p-wspm's can be thought of as interpolants between the Euclidean metric and the longest leg path distance (defined in §2.3), which we shall abbreviate to LLPD.3. We introduce a novel modified version of Dijkstra's algorithm that computes the k nearest neighbors, with respect to any p-wspm or the LLPD, of any x α in X in O(k 2 T Enn ) time, where T Enn is the cost of a Euclidean nearest-neighbor query. Hence one can construct a p-wspm k-NN graph in O(nk 2 T Enn ). As we typically have k n, i.e. k = O(log(n)) or even k = O(1), this means that constructing a p-wspm k-NN graph requires only marginally more time than constructing a Euclidean k-NN graph (which requires O(nkT Enn )).4. We verify experimentally that using a p-wspm in lieu of the Euclidean metric results in an appreciable increase in clustering accuracy, at the cost of a small increase in run time, for a wide range of real and synthetic data sets.After establishing notation and surveying the literature in §2, we prove our main results in §3 and §4. In §5 we present our algorithm for computing k nearest neighbors in any p-wspm, while in §6 we report the results of our numerical experiments.

show abstract

A one-bit, comparison-based gradient estimator

Cai

McKenzie

Yin

et al. 2022

Applied and Computational Harmonic Analysis

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel McKenzie

Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling

Compressive Sensing for Cut Improvement and Local Clustering

JFB: Jacobian-Free Backpropagation for Implicit Networks

Power weighted shortest paths for clustering Euclidean data

A one-bit, comparison-based gradient estimator

Contact Info

Product

Resources

About