Empirical Evaluation of the Difficulty of Finding a Good Value of k for the Nearest Neighbor

Ferrer–Troyano, Francisco J.; Aguilar–Ruiz, Jesús S.; Riquelme, José C.

doi:10.1007/3-540-44862-4_83

Cited by 9 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tang [39] proposes a traffic prediction method for scaling resources in NFV environments based on traffic modeling with an Autoregressive Moving Average (ARMA); the predicted traffic values are obtained by minimizing MSE. Among the solutions based on the prediction of the resources to be allocated, Farahnakian [40] proposes regressive algorithms for estimating memory and processing consumption in cloud datacenters; the proposed solutions are based on Linear Regression [41] and K-Nearest Neighbor Regression (K-NNR) [42] methods that notoriously determine the prediction by minimizing symmetric error functions. A VNF migration algorithm is proposed and investigated in [43]; it is based on a deep belief network framework to predict the future resource requirements; the authors show how the proposed solution can obtain better estimates of CPU resources than a solution based on Back Propagation Neural Network [44] in terms of MSE.…”

Section: Related Work and Research Motivationmentioning

confidence: 99%

Reconfiguration of Optical-NFV Network Architectures Based on Cloud Resource Allocation and QoS Degradation Cost-Aware Prediction Techniques

et al. 2020

View full text Add to dashboard Cite

The high time required for the deployment of cloud resources in Network Function Virtualization network architectures has led to the proposal and investigation of algorithms for predicting traffic or the necessary processing and memory resources. However, it is well known that whatever approach is taken, a prediction error is inevitable. Two types of prediction errors can occur that have a different impact on the increase in network operational costs. In case the predicted values are higher than the real ones, the resource allocation algorithms will allocate more resources than necessary with the consequent introduction of an over-provisioning cost. Conversely, when the predicted values are lower than the real values, the allocation of fewer resources will lead to a degradation of QoS and the introduction of an under-provisioning cost. When over-provisioning and under-provisioning costs are different, most of the prediction algorithms proposed in the literature are not adequate because they are based on minimizing the mean square error or symmetric cost functions. For this reason we propose and investigate a forecasting methodology in which it is introduced an asymmetric cost function capable of weighing the costs of over-provisioning and underprovisioning differently. We have applied the proposed forecasting methodology for resource allocation in a Network Function Virtualization architectures where the Network Function Virtualization Infrastructure Point-of-Presences are interconnected by an elastic optical network. We have verified a cost savings of 40% compared to solutions that provide a minimization of the mean square error.

show abstract

Section: Related Work and Research Motivationmentioning

confidence: 99%

Reconfiguration of Optical-NFV Network Architectures Based on Cloud Resource Allocation and QoS Degradation Cost-Aware Prediction Techniques

et al. 2020

View full text Add to dashboard Cite

show abstract

“…From an applied perspective, Ferrer-Troyano et al [6] present a comparison of k-nn over various UCI datasets, showing that finding a 'best' k can be difficult. They observe that larger values of k produce smaller errors on some datasets, but low values of k are similar on other datasets.…”

Section: Selecting K For K-nnmentioning

confidence: 99%

“…However, larger values of k tend to produce smoother models and are less sensitive to label noise. Ferrer-Troyano et al [6] show that for some data sets the prediction error varied greatly depending on the value selected for k. Thus, the choice of k must be carefully made for the task at hand.…”

Section: Introductionmentioning

confidence: 99%

Efficient Model Selection for Large-Scale Nearest-Neighbor Data Mining

Hamerly

Speegle

2012

Data Security and Security Data

View full text Add to dashboard Cite

Abstract. One of the most widely used models for large-scale data mining is the k-nearest neighbor (k-nn) algorithm. It can be used for classification, regression, density estimation, and information retrieval. To use k-nn, a practitioner must first choose k, usually selecting the k with the minimal loss estimated by cross-validation. In this work, we begin with an existing but little-studied method that greatly accelerates the cross-validation process for selecting k from a range of user-provided possibilities. The result is that a much larger range of k values may be examined more quickly. Next, we extend this algorithm with an additional optimization to provide improved performance for locally linear regression problems. We also show how this method can be applied to automatically select the range of k values when the user has no a priori knowledge of appropriate bounds. Furthermore, we apply statistical methods to reduce the number of examples examined while still finding a likely best k, greatly improving performance for large data sets. Finally, we present both analytical and experimental results that demonstrate these benefits.

show abstract

“…In an earlier work [15], we have shown that by improving the distance measure and the similarity function, the performance of the kNN classifier can be increased significantly. However, a proper choice of k is also very crucial for better performance of the kNN classifier [3,16,19]. In this work, we propose a novel test point-specific k estimation strategy solely to improve the classification accuracy of the kNN classifier.…”

Section: Introductionmentioning

confidence: 99%

“…Performance of the kNN algorithm depends on several key factors including i) a suitable distance measure, ii) a similarity measure for voting, and, iii) an appropriate value/choice for the parameter k [14][15][16][17][18]. In an earlier work [15], we have shown that by improving the distance measure and the similarity function, the performance of the kNN classifier can be increased significantly.…”

Section: Introductionmentioning

confidence: 99%

Test Point Specific k Estimation for kNN Classifier

Bhattacharya

Ghosh

Chowdhury

2014

2014 22nd International Conference on Pattern Recognition

View full text Add to dashboard Cite

Accuracy of the well-known kNN classifier depends significantly on the suitable choice of k. In this paper, we propose an improved kNN algorithm with a novel nonparametric test point specific k estimation strategy. To estimate k for any test point, we first construct a hypersphere around it to capture the local distribution of the surrounding training points. Class hubness information is then used as a weight on the hypervolume of the above hypersphere. Experiments on several UCI benchmark datasets clearly demonstrate the supremacy of our improved kNN algorithm over various existing versions such as i) kNN with fixed values of k (k= 1, 3, 5, 7, [ number of training points]) [1-3], ii) kNN with test point specific k [4], and, iii) kNN with hubness information [5].

show abstract

Empirical Evaluation of the Difficulty of Finding a Good Value of k for the Nearest Neighbor

Cited by 9 publications

References 9 publications

Reconfiguration of Optical-NFV Network Architectures Based on Cloud Resource Allocation and QoS Degradation Cost-Aware Prediction Techniques

Reconfiguration of Optical-NFV Network Architectures Based on Cloud Resource Allocation and QoS Degradation Cost-Aware Prediction Techniques

Efficient Model Selection for Large-Scale Nearest-Neighbor Data Mining

Test Point Specific k Estimation for kNN Classifier

Contact Info

Product

Resources

About