2012
DOI: 10.1007/978-3-642-25704-9_6
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Model Selection for Large-Scale Nearest-Neighbor Data Mining

Abstract: Abstract. One of the most widely used models for large-scale data mining is the k-nearest neighbor (k-nn) algorithm. It can be used for classification, regression, density estimation, and information retrieval. To use k-nn, a practitioner must first choose k, usually selecting the k with the minimal loss estimated by cross-validation. In this work, we begin with an existing but little-studied method that greatly accelerates the cross-validation process for selecting k from a range of user-provided possibilitie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…We now mention some important results on kNN with variable k. In [21], Hand and Vinciotti showed how k can be chosen only for a two-class classifier where the classes are unbalanced. In [22], Hammerly and Speegle demonstrated an efficient method for k selection based on cross-validation strategies. Song et al in [23] proposed two simple extensions of the kNN method using the concept of informativeness but prior crossvalidation of the datasets for choosing the optimal value of k. Moreover, the authors in [23] do not explicitly take into account the variations in local densities of training points surrounding a test point.…”
Section: Related Workmentioning
confidence: 99%
“…We now mention some important results on kNN with variable k. In [21], Hand and Vinciotti showed how k can be chosen only for a two-class classifier where the classes are unbalanced. In [22], Hammerly and Speegle demonstrated an efficient method for k selection based on cross-validation strategies. Song et al in [23] proposed two simple extensions of the kNN method using the concept of informativeness but prior crossvalidation of the datasets for choosing the optimal value of k. Moreover, the authors in [23] do not explicitly take into account the variations in local densities of training points surrounding a test point.…”
Section: Related Workmentioning
confidence: 99%
“…Assessment of model's prediction error, tuning model's parameters, and model selection, e.g., [237] and [244]. In particular, it is often used to determine the number of neighbors in k-Nearest Neighbors models, e.g., [121] and [199]. In the next section we illustrate the use of Hold-out validation to determine a suitable matrix rank when solving least squares problems.…”
Section: The Training Set the Probe Set And Cross-validationmentioning
confidence: 99%