Nearest neighbors methods for support vector machines

Camelo, S. A.; González-Lima, María D.; Quiróz, Adolfo J.

doi:10.1007/s10479-015-1956-8

Cited by 6 publications

(15 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A different approach is presented in [14] where the reduction of the data includes the assignments of large weights to important samples and the reduction of the features, by using graph and self-paced learning. The methods in [6] and [5] use nearest neighbors with sub-sampling in order to select a subset of significant instances. They start by training an SVM formed by a very small sub-samples of the data set.…”

Section: Related Workmentioning

confidence: 99%

“…And, if λ is chosen smaller than 1, the contribution of λ n A n X is small for large n. Then, in practice it is enough to sum over a finite number N k of terms. However, even with these considerations, the running time for the computation of the Kernel using (6) is O(k 6 ) with k = |V ||V ′ |, so it can be very high for large graphs. A cheaper way to compute the Random Walk Kernel is to use the following equivalence…”

Section: Basic Facts and Notationmentioning

confidence: 99%

See 1 more Smart Citation

A Graph Classification Method Based on Support Vector Machines and Locality-Sensitive Hashing

Gonzalez-Lima,

Ludeña,

Otazo-Sanchez

2024

IEEE Access

View full text Add to dashboard Cite

Graphs classification is a relevant problem that arises in many disciplines. Using graphs directly instead of vectorization allows to exploit the intrinsic representations of the data. Support Vector Machines (SVM) is a supervised learning method based on the use of graph kernel functions used for this task. One of the problems of SVM, as the number of samples increases, is the cost of storing and solving the optimization problem related to SVM. In this work we propose a method capable of finding a small representative subset of the whole graph data set such that an approximate solution of the SVM optimization problem can be obtained in a fraction of the time, and without significantly degrading the classification prediction error. The method is based on the use of Locality-Sensitive Hashing for projecting over the Hilbert spaces defined by appropriate graph kernels that measure similarity between the graphs. A description of the algorithm, as well as numerical results using two graph kernels (Simple Product and Random Walk) on simulated and real life data sets are presented. The numerical experiments compare the training times and the classification error between the SVM obtained with our smart sampling approach, and the SVM obtained over the complete data set or over a random sub-sample. The results offer evidence of the advantages of our proposal for solving large scale graph classification problems when using SVM.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Basic Facts and Notationmentioning

confidence: 99%

A Graph Classification Method Based on Support Vector Machines and Locality-Sensitive Hashing

Gonzalez-Lima,

Ludeña,

Otazo-Sanchez

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Another contribution of the present work is to propose a new subsampling algorithm by improving the results of Camelo et al (2015) [4], at least in a significant number of cases, by enriching the subsample with more candidates to support vectors using bagging and importance sampling. This is achieved by looking simultaneously at different samples and searching for neighbors according to the candidates' intensity.…”

Section: Our Contributionmentioning

confidence: 99%

“…By testing on benchmark examples and comparing with state-of-the-art methodologies (such as the ones proposed in [4], LibSVM [5], SVM light [6], and decision trees [7]), we show that our proposed method achieves a fast solution to the training SVM problem without a significant loss in the performance accuracy. It is important to highlight that one goal of this paper is to compare algorithms using the same working framework in order to conclude about efficiency and effectiveness.…”

Section: Our Contributionmentioning

confidence: 99%

On Subsampling Procedures for Support Vector Machines

2022

View full text Add to dashboard Cite

Herein, theoretical results are presented to provide insights into the effectiveness of subsampling methods in reducing the amount of instances required in the training stage when applying support vector machines (SVMs) for classification in big data scenarios. Our main theorem states that under some conditions, there exists, with high probability, a feasible solution to the SVM problem for a randomly chosen training subsample, with the corresponding classifier as close as desired (in terms of classification error) to the classifier obtained from training with the complete dataset. The main theorem also reflects the curse of dimensionalityin that the assumptions made for the results are much more restrictive in large dimensions; thus, subsampling methods will perform better in lower dimensions. Additionally, we propose an importance sampling and bagging subsampling method that expands the nearest-neighbors ideas presented in previous work. Using different benchmark examples, the method proposed herein presents a faster solution to the SVM problem (without significant loss in accuracy) compared with the available state-of-the-art techniques.

show abstract

“…For example, specialized algorithms for solving quadratic programming have been suggested, including the sequential minimal optimization (Platt, 1998) and various decomposition methods used in the LibLinear software library (Hsieh et al, 2008). Other fast computation methods based on low-rank approximation (Williams and Seeger, 2000), gradient descent (Bordes et al, 2005;Shalev-Shwartz et al, 2011;Wang et al, 2012), core set (Tsang et al, 2005), and nearest neighbor (Camelo et al, 2015) have also been developed. However, it is worth noting that most of these methods still incur a computational cost of at least O(N 2 ) or lack optimal statistical guarantees.…”

Section: Introductionmentioning

confidence: 99%

Leverage Classifier: Another Look at Support Vector Machine

Han,

Yu,

Zhang

et al. 2025

STAT SINICA

View full text Add to dashboard Cite

Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.

show abstract

Nearest neighbors methods for support vector machines

Cited by 6 publications

References 25 publications

A Graph Classification Method Based on Support Vector Machines and Locality-Sensitive Hashing

A Graph Classification Method Based on Support Vector Machines and Locality-Sensitive Hashing

On Subsampling Procedures for Support Vector Machines

Leverage Classifier: Another Look at Support Vector Machine

Contact Info

Product

Resources

About