Abstract.The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving large-scale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal Optimization (SMO) learning algorithm (based on an analytic quadratic programming step for a model without bias term b) in designing SVMs with positive definite kernels is shown for both the nonlinear classification and the nonlinear regression tasks. The chapter also introduces the classic Gauss-Seidel (GS) procedure and its derivative known as the successive over-relaxation (SOR) algorithm as viable (and usually faster) training algorithms. The convergence theorem for these related iterative algorithms is proven. The second part of the chapter presents the effects and the methods of incorporating explicit bias term b into the ISDA. The algorithms shown here implement the single training data based iteration routine (a.k.a. per-pattern learning). This makes the proposed ISDAs remarkably quick. The final solution in a dual domain is not an approximate one, but it is the optimal set of dual variables which would have been obtained by using any of existing and proven QP problem solvers if they only could deal with huge data sets.
IntroductionOne of the mainstream research fields in learning from empirical data by support vector machines (SVMs), and solving both the classification and the regression problems, is an implementation of the incremental learning schemes when the training data set is huge. The challenge of applying SVMs on huge data sets comes from the fact that the amount of computer memory required for a standard quadratic programming (QP) solver grows exponentially as the size of the problem increased. Among several candidates that avoid the use of standard QP solvers, the two learning approaches which recently have drawn the attention are the Iterative Single Data Algorithms (ISDAs), and the sequential minimal optimization (SMO) (Platt, 1998(Platt, , 1999Vogt 2002;Kecman, Vogt, Huang 2003;Huang and Kecman 2004).The ISDAs work on one data point at a time (per-pattern based learning) towards the optimal solution. The Kernel AdaTron (KA) is the earliest ISDA for SVMs, which uses kernel functions to map data into SVMs' high dimensional feature space (Frieß et al. 1998) and performs AdaTron learning (Anlauf and Biehl 1989) in the feature space. The Platt's SMO algorithm is an extreme case of the decomposition methods developed in (Osuna, Freund, Girosi 1997;Joachims 1999), which works on a working set of two data points at a time. Because of the fact that the solution for working set of two can be found analytically, SMO algorithm does not invoke standard QP solvers. Due to its analytical foundation the SMO approach is particularly popular and at the moment the widest used, analyzed and still heavily developing algorithm. At the same time, the KA although providing similar results in solving classific...