Proximal gradient method for huberized support vector machine

Xu, Yangyang; Akrotirianakis, Ioannis; Chakraborty, Amit

doi:10.1007/s10044-015-0485-z

Cited by 27 publications

(11 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We will show that a range of nontrivial ω k > 0 always exists to satisfy Condition 2.1 under a mild assumption, and thus one can backtrack ω k to ensure F (x k ) ≤ F (x k−1 ), ∀k. Maintaining the monotonicity of F (x k ) can significantly improve the numerical performance of the algorithm, as shown in our numerical results below and also in [44,55]. Note that subsequence convergence does not require this condition.…”

Section: )mentioning

confidence: 62%

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Yin²

2017

J Sci Comput

Self Cite

231

197

View full text Add to dashboard Cite

Abstract. Nonconvex optimization arises in many areas of computational science and engineering. However, most nonconvex optimization algorithms are only known to have local convergence or subsequence convergence properties. In this paper, we propose an algorithm for nonconvex optimization and establish its global convergence (of the whole sequence) to a critical point. In addition, we give its asymptotic convergence rate and numerically demonstrate its efficiency.In our algorithm, the variables of the underlying problem are either treated as one block or multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function, or each constraint, applies only to one block of variables. The differentiable components of the objective function, however, can involve multiple blocks of variables together.Our algorithm updates one block of variables at a time by minimizing a certain prox-linear surrogate, along with an extrapolation to accelerate its convergence. The order of update can be either deterministically cyclic or randomly shuffled for each cycle. In fact, our convergence analysis only needs that each block be updated at least once in every fixed number of iterations. We show its global convergence (of the whole sequence) to a critical point under fairly loose conditions including, in particular, the Kurdyka-Lojasiewicz (KL) condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. These results, of course, remain valid when the underlying problem is convex.We apply our convergence results to the coordinate descent iteration for non-convex regularized linear regression, as well as a modified rank-one residue iteration for nonnegative matrix factorization. We show that both applications have global convergence. Numerically, we tested our algorithm on nonnegative matrix and tensor factorization problems, where random shuffling clearly improves to chance to avoid low-quality local solutions.

show abstract

Section: )mentioning

confidence: 62%

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Yin²

2017

J Sci Comput

Self Cite

231

197

View full text Add to dashboard Cite

show abstract

“…Hence, the condition in (2.4) can be slightly stronger that in (2.6). The condition in (2.4) holds if dom(h) is bounded and naturally holds if { f j } are linear or the logistic loss or the huberized hinge loss functions Xu et al (2016).…”

Section: Assumptionsmentioning

confidence: 99%

First-Order Methods for Constrained Convex Programming Based on Linearized Augmented Lagrangian Function

2021

INFORMS Journal on Optimization

Self Cite

View full text Add to dashboard Cite

First-order methods (FOMs) have been popularly used for solving large-scale problems. However, many existing works only consider unconstrained problems or those with simple constraint. In this paper, we develop two FOMs for constrained convex programs, where the constraint set is represented by affine equations and smooth nonlinear inequalities. Both methods are based on the classical augmented Lagrangian function. They update the multipliers in the same way as the augmented Lagrangian method (ALM) but use different primal updates. The first method, at each iteration, performs a single proximal gradient step to the primal variable, and the second method is a block update version of the first one. For the first method, we establish its global iterate convergence and global sublinear and local linear convergence, and for the second method, we show a global sublinear convergence result in expectation. Numerical experiments are carried out on the basis pursuit denoising, convex quadratically constrained quadratic programs, and the Neyman-Pearson classification problem to show the empirical performance of the proposed methods. Their numerical behaviors closely match the established theoretical results.

show abstract

“…After this, a projection step is performed to project x t+ 1 2 onto X . In particular, two scenarios may happen: (i) x t+ 1 2 ∈ X , in this case, the projection step directly returns x t+ 1 2 ; (ii) x t+ 1 2 / ∈ X , we need to project x t+ 1 2 to the closest point in X , which is very time-consuming.…”

Section: Fully Projection-free Proximal Stochastic Gradientmentioning

confidence: 99%

“…P ROXIMAL stochastic gradient method is widely used to solve large-scale machine learning problems such as support vector machines [1,2] and logistic regression [3]. Generally, it iteratively finds a descent direction, and then updates the model within a feasible set by following the direction until convergence.…”

Section: Introductionmentioning

confidence: 99%

Fully Projection-Free Proximal Stochastic Gradient Method With Optimal Convergence Rates

2020

View full text Add to dashboard Cite

Proximal stochastic gradient plays an important role in large-scale machine learning and big data analysis. It needs to iteratively update models within a feasible set until convergence. The computational cost is usually high due to the projection over the feasible set. To reduce complexity, many projection-free methods such as Frank-Wolfe methods have been proposed. However, those projection-free methods have to solve a linear programming problem for every update of models which still leads to high computational cost for a complex feasible set, and can be unbearable in practical scenarios. Motivated by this problem, we propose a fully projection-free proximal stochastic gradient method, which has two advantages over previous methods. First, it enjoys high efficiency. The proposed method does not conduct projection directly but finds an approximately correct projection point with a very low computational cost. Second, it achieves tight and optimal convergence rates. Our theoretical analysis shows that the proposed method achieves convergence rates of O 1 √ T and O log T T for convex and strongly convex functions, respectively. These convergence rates successfully match with the known lower bounds. Therefore, in this paper, we provide a valuable insight that some loss of accuracy of projection can improve the efficiency significantly, but does not impair convergence rates. Finally, empirical studies show that the proposed method achieves more than 5× speedup than previous methods.

show abstract

Proximal gradient method for huberized support vector machine

Cited by 27 publications

References 50 publications

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

First-Order Methods for Constrained Convex Programming Based on Linearized Augmented Lagrangian Function

Fully Projection-Free Proximal Stochastic Gradient Method With Optimal Convergence Rates

Contact Info

Product

Resources

About