Rahul Mazumder scite author profile

In the last twenty-five years , algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing k out of p features in linear regression given n observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with n in the 1000s and p in the 100s in minutes to provable optimality, and finds near optimal solutions for n in the 100s and p in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than Lasso and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

show abstract

SparseNet: Coordinate Descent With Nonconvex Penalties

Mazumder

Friedman

Hastie

2011

Journal of the American Statistical Association

423

455

View full text Add to dashboard Cite

We address the problem of sparse selection in linear models. A number of nonconvex penalties have been proposed in the literature for this purpose, along with a variety of convex-relaxation algorithms for finding good solutions. In this article we pursue a coordinate-descent approach for optimization, and study its convergence properties. We characterize the properties of penalties suitable for this approach, study their corresponding threshold functions, and describe a df-standardizing reparametrization that assists our pathwise algorithm. The MC+ penalty is ideally suited to this task, and we use it to demonstrate the performance of our algorithm. Certain technical derivations and experiments related to this article are included in the Supplementary Materials section.

show abstract

The graphical lasso: New insights and alternatives

Mazumder¹,

Hastie²

2012

Electron. J. Statist.

238

198

View full text Add to dashboard Cite

The graphical lasso [5] is an algorithm for learning the structure in an undirected Gaussian graphical model, using ℓ1 regularization to control the number of zeros in the precision matrix Θ = Σ−1 [2, 11]. The R package GLASSO [5] is popular, fast, and allows one to efficiently build a path of models for different values of the tuning parameter. Convergence of GLASSO can be tricky; the converged precision matrix might not be the inverse of the estimated covariance, and occasionally it fails to converge with warm starts. In this paper we explain this behavior, and propose new algorithms that appear to outperform GLASSO. By studying the “normal equations” we see that, GLASSO is solving the dual of the graphical lasso penalized likelihood, by block coordinate ascent; a result which can also be found in [2]. In this dual, the target of estimation is Σ, the covariance matrix, rather than the precision matrix Θ. We propose similar primal algorithms P-GLASSO and DP-GLASSO, that also operate by block-coordinate descent, where Θ is the optimization target. We study all of these algorithms, and in particular different approaches to solving their coordinate sub-problems. We conclude that DP-GLASSO is superior from several points of view.

show abstract

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

Hazimeh

Mazumder

2020

Operations Research

112

128

View full text Add to dashboard Cite

In several scientific and industrial applications, it is desirable to build compact, interpretable learning models where the output depends on a small number of input features. Recent work has shown that such best-subset selection-type problems can be solved with modern mixed integer optimization solvers. Despite their promise, such solvers often come at a steep computational price when compared with open-source, efficient specialized solvers based on convex optimization and greedy heuristics. In “Fast Best-Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms,” Hussein Hazimeh and Rahul Mazumder push the frontiers of computation for best-subset-type problems. Their algorithms deliver near-optimal solutions for problems with up to a million features—in times comparable with the fast convex solvers. Their work suggests that principled optimization methods play a key role in devising tools central to interpretable machine learning, which can help in gaining a deeper understanding of their statistical properties.

show abstract

An Extended Frank--Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion

Freund¹,

Grigas²,

Mazumder³

2017

SIAM J. Optim.

View full text Add to dashboard Cite

Motivated principally by the low-rank matrix completion problem, we present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region. This is accomplished by a new approach to generating "in-face" directions at each iteration, as well as through new choice rules for selecting between in-face and "regular" Frank-Wolfe steps. Our framework for generating in-face directions generalizes the notion of away-steps introduced by Wolfe. In particular, the in-face directions always keep the next iterate within the minimal face containing the current iterate. We present computational guarantees for the new method that trade off efficiency in computing near-optimal solutions with upper bounds on the dimension of minimal faces of iterates. We apply the new method to the matrix completion problem, where low-dimensional faces correspond to low-rank matrices. We present computational results that demonstrate the effectiveness of our methodological approach at producing nearly-optimal solutions of very low rank. On both artificial and real datasets, we demonstrate significant speed-ups in computing very low-rank nearly-optimal solutions as compared to either the Frank-Wolfe method or its traditional away-step variant.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rahul Mazumder

Best subset selection via a modern optimization lens

SparseNet: Coordinate Descent With Nonconvex Penalties

The graphical lasso: New insights and alternatives

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

An Extended Frank--Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion

Contact Info

Product

Resources

About