First and Second Order SMO Algorithms for LS-SVM Classifiers

López, Jorge; Suykens, Johan A. K.

doi:10.1007/s11063-010-9162-9

Cited by 24 publications

(18 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Exploiting the fact that LSSVM, RLS and kernel FDA are equivalent (Rifkin, 2002;Gestel et al, 2002;Keerthi and Shevade, 2003), sequential minimal optimisation (SMO) techniques (Joachims, 1988) developed for LSSVM (Keerthi and Shevade, 2003;Lopez and Suykens, 2011) can be employed to remedy these problems. This effectively leads to an interleaved algorithm that is similar to Algorithm 2 in Kloft et al (2011), but applies to square loss instead of to hinge loss.…”

Section: Interleaved Optimisation Of the Saddle Point Problemmentioning

confidence: 99%

“…Such an interleaved optimisation strategy allows for a very cheap update of a minimal subset of the dual variables α k in each α step, without having to have access to the whole kernel matrices, and as a result extends the applicability of MK-FDA to large scale problems. We omit details of the resulting interleaved MK-FDA algorithm, the interested reader is referred to Keerthi and Shevade (2003) and Lopez and Suykens (2011).…”

Section: Interleaved Optimisation Of the Saddle Point Problemmentioning

confidence: 99%

“…Another difference is that Equation (9) in Vishwanathan et al (2010) considers a hinge loss, while Equation (26) is for square loss. Similarly as in Vishwanathan et al (2010), for any p > 1, Equation (26) can be solved using an SMO type of algorithm, with the update rule for the minimal subset of dual variables adapted to work with square loss (Keerthi and Shevade, 2003;Lopez and Suykens, 2011). On the other hand, observing that Equation (26) is an unconstrained optimisation problem and the objective function is differentiable everywhere for p > 1, an alternative approach is the quasi-Newton descent methods, for example, the limited memory variant (Liu and Nocedal, 1989).…”

Section: Working Directly With the Dualmentioning

confidence: 99%

See 2 more Smart Citations

Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis

Yan

Kittler

Mikolajczyk

et al. 2009

2009 Ninth IEEE International Conference on Data Mining

View full text Add to dashboard Cite

Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general ℓ p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances in MKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of ℓ p MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that ℓ p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, ℓ p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines.

show abstract

Section: Interleaved Optimisation Of the Saddle Point Problemmentioning

confidence: 99%

Section: Interleaved Optimisation Of the Saddle Point Problemmentioning

confidence: 99%

Section: Working Directly With the Dualmentioning

confidence: 99%

See 1 more Smart Citation

Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis

Yan

Kittler

Mikolajczyk

et al. 2009

2009 Ninth IEEE International Conference on Data Mining

View full text Add to dashboard Cite

show abstract

“…These algorithms achieve low complexity, but their solutions are not sparse. In [121], the applicability of SMO is explored for solving the LS-SVM problem, by comparing first-order and second-order working-set selections concentrating on the RBF kernel. Second-order working-set selection is more convenient than first-order one.…”

Section: Least-squares Svmsmentioning

confidence: 99%

Support Vector Machines

Du¹,

Swamy

2013

Neural Networks and Statistical Learning

View full text Add to dashboard Cite

SVM [12,201] is one of the most popular nonparametric classification algorithms. It is optimal and is based on computational learning theory [200,202]. The goal of SVM is to minimize the VC dimension by finding the optimal hyperplane between classes, with the maximal margin, where the margin is defined as the distance of the closest point in each class to the separating hyperplane. It has a general-purpose linear learning algorithm and a problem-specific kernel that computes the inner product of input data points in a feature space. The key idea of SVM is to project the training set in a high-dimensional space into a lower dimensional feature space by means of a set of nonlinear kernel functions, where the projections of the training examples are always linearly separable in the feature space. The hippocampus, a brain region critical for learning and memory processes, has been reported to possess pattern separation function similar to SVM [6].SVM is a three-layer feedforward network. It implements the structural riskminimization (SRM) principle that minimizes the upper bound of the generalization error. This induction principle is based on the fact that the generalization error is bounded by the sum of a training error and a confidence-interval term that depends on the VC dimension. Generalization errors of SVMs are not related to the input dimensionality, but to the margin with which it separates the data. Instead of minimizing the training error, SVM purports to minimize an upper bound of the generalization error and maximizes the margin between a separating hyperplane and the training data.SVM is a universal approximator for various kernels [70]. It is popular for classification, regression, and clustering. One of the main features of SVM is the absence of local minima. SVM is defined in terms of a subset of the learning data, called support vectors. It is a sparse representation of the training data, and allows the extraction of a condensed dataset based on the support vectors.Kernel

show abstract

“…Here, they selected the pair that has the most violation on the KKT optimality conditions that is the maximum violating pair (MVP). Second order SMO algorithm for LS-SVM [22] uses second order approximation information of the dual function [23], where the first index is selected using the same method as in the first order SMO algorithm [18], but the second index is selected using a more accurate method so that the number of iterations of the second order SMO algorithm is much smaller than that of the first order SMO algorithm. It has been shown that LS-SVM can be simplified further and extended to general practical applications (regression, binary and multiclass classifications) without much changes with the unified ELM solution [24][25][26][27].…”

Section: Introductionmentioning

confidence: 99%

A fast iterative single data approach to training unconstrained least squares support vector machines

Song

2013

Neurocomputing

View full text Add to dashboard Cite

First and Second Order SMO Algorithms for LS-SVM Classifiers

Cited by 24 publications

References 14 publications

Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis

Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis

Support Vector Machines

A fast iterative single data approach to training unconstrained least squares support vector machines

Contact Info

Product

Resources

About