Towards scalable support vector machines using squashing

Pavlov, Dmitry; Chudova, Darya; Smyth, Padhraic

doi:10.1145/347090.347155

Cited by 42 publications

(35 citation statements)

References 6 publications

(4 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pavlov applied data squashing to support vector machine: a classifier which maximizes margins of training examples under a similar philosophy to boosting [9]. Nakayasu substituted a product-sum matrix for the CF vector, and applied their method to Bayesian classification [8].…”

Section: Application Of Data Squashing To Classification and Regressionmentioning

confidence: 99%

Iterative Data Squashing for Boosting Based on a Distribution-Sensitive Distance

Choki

Suzuki

2002

Principles of Data Mining and Knowledge Discovery

View full text Add to dashboard Cite

Abstract. This paper proposes, for boosting, a novel method which prevents deterioration of accuracy inherent to data squashing methods. Boosting, which constructs a highly accurate classification model by combining multiple classification models, requires long computational time. Data squashing, which speeds-up a learning method by abstracting the training data set to a smaller data set, typically lowers accuracy. Our SB (Squashing-Boosting) loop, based on a distribution-sensitive distance, alternates data squashing and boosting, and iteratively refines an SF (Squashed-Feature) tree, which provides an appropriately squashed data set. Experimental evaluation with artificial data sets and the KDD Cup 1999 data set clearly shows superiority of our method compared with conventional methods. We have also empirically evaluated our distance measure as well as our SF tree, and found them superior to alternatives.

show abstract

Section: Application Of Data Squashing To Classification and Regressionmentioning

confidence: 99%

Iterative Data Squashing for Boosting Based on a Distribution-Sensitive Distance

Choki

Suzuki

2002

Principles of Data Mining and Knowledge Discovery

View full text Add to dashboard Cite

show abstract

“…For example, it has been proposed to construct an approximate SVM by approximating the Gram matrix with a smaller matrix using either low rank representation [12] or sampling [1,29]. Assuming a linear kernel is used, Pavlov et al [23] proposed to squash the original training dataset into a limited number of representatives and construct an approximate SVM using these representatives. Although these algorithms, together with many other algorithms for approximate SVMs, are well motivated and have been shown to be very effective experimentally, there is no direct theoretical justification on the generalization performance of the resulting approximate SVMs.…”

Section: Introductionmentioning

confidence: 99%

A PAC Bound for Approximate Support Vector Machines

Cao

Boley²

2007

Proceedings of the 2007 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

We study a class of algorithms that speed up the training process of support vector machines (SVMs) by returning an approximate SVM. We focus on algorithms that reduce the size of the optimization problem by extracting from the original training dataset a small number of representatives and using these representatives to train an approximate SVM. The main contribution of this paper is a PAC-style generalization bound for the resulting approximate SVM, which provides a learning theoretic justification for using the approximate SVM in practice. The proved bound also generalizes and includes as a special case the generalization bound for the exact SVM, which denotes the SVM given by the original training dataset in this paper. Keyword:Support Vector Machines, Approximate Solutions, Generalization Bounds, Algorithmic Stability 1 Introduction One challenge in using support vector machines (SVM) [8,28] for problems with a large number of training data, which are common in data mining applications, is the prohibitive computational requirement for training, which involves solving a convex optimization problem. To address this issue, many efficient training algorithms have been proposed.The first class of algorithms attack the optimization problem directly. A commonly used strategy is to solve a series of small optimization problems, where ideas like chunking and decomposition are used [4,17,22], and one noteworthy example is the sequential minimal optimization (SMO) algorithm [24]. Special algorithms have also been developed for SVMs with particular kernels, such the linear kernel [18] and the Gaussian kernel [30]. We denote the SVM given by these algorithms and other algorithms that use the original training dataset directly as the exact SVM.

show abstract

“…Enhancing the SVM training process with clustering or similar techniques has been examined with several variations in [34], [26] and [29]. Based on a hierarchical micro-clustering algorithm, Yu et al [34] proposed a scalable algorithm to train support vector machines with a linear kernel.…”

Section: Introductionmentioning

confidence: 99%

“…In [29], Shih et al proposed a technique called text bundling, where the training data are partitioned into clusters based on a Rocchio score, and data within a cluster are replaced by their mean. Pavlov et al [26] proposed a strategy to speed up the training process by squashing. Both [29] and [26] used only linear kernels in their experiments.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Training Support Vector Machine using Adaptive Clustering

Boley¹,

Cao²

2004

Proceedings of the 2004 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

Training support vector machines involves a huge optimization problem and many specially designed algorithms have been proposed. In this paper, we proposed an algorithm called ClusterSVM that accelerates the training process by exploiting the distributional properties of the training data, that is, the natural clustering of the training data and the overall layout of these clusters relative to the decision boundary of support vector machines. The proposed algorithm first partitions the training data into several pair-wise disjoint clusters. Then, the representatives of these clusters are used to train an initial support vector machine, based on which we can approximately identify the support vectors and non-support vectors. After replacing the cluster containing only non-support vectors with its representative, the number of training data can be significantly reduced, thereby speeding up the training process. The proposed ClusterSVM has been tested against the popular training algorithm SMO on both the artificial data and the real data, and a significant speedup was observed. The complexity of ClusterSVM scales with the square of the number of support vectors and, after a further improvement, it is expected that it will scale with square of the number of non-boundary support vectors.

show abstract

Towards scalable support vector machines using squashing

Cited by 42 publications

References 6 publications

Iterative Data Squashing for Boosting Based on a Distribution-Sensitive Distance

Iterative Data Squashing for Boosting Based on a Distribution-Sensitive Distance

A PAC Bound for Approximate Support Vector Machines

Training Support Vector Machine using Adaptive Clustering

Contact Info

Product

Resources

About