Efficient Model Selection for Large-Scale Nearest-Neighbor Data Mining

Hamerly, Greg; Speegle, Gregory D.

doi:10.1007/978-3-642-25704-9_6

Cited by 5 publications

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We now mention some important results on kNN with variable k. In [21], Hand and Vinciotti showed how k can be chosen only for a two-class classifier where the classes are unbalanced. In [22], Hammerly and Speegle demonstrated an efficient method for k selection based on cross-validation strategies. Song et al in [23] proposed two simple extensions of the kNN method using the concept of informativeness but prior crossvalidation of the datasets for choosing the optimal value of k. Moreover, the authors in [23] do not explicitly take into account the variations in local densities of training points surrounding a test point.…”

Section: Related Workmentioning

confidence: 99%

Test Point Specific k Estimation for kNN Classifier

Bhattacharya

Ghosh

Chowdhury

2014

2014 22nd International Conference on Pattern Recognition

View full text Add to dashboard Cite

Accuracy of the well-known kNN classifier depends significantly on the suitable choice of k. In this paper, we propose an improved kNN algorithm with a novel nonparametric test point specific k estimation strategy. To estimate k for any test point, we first construct a hypersphere around it to capture the local distribution of the surrounding training points. Class hubness information is then used as a weight on the hypervolume of the above hypersphere. Experiments on several UCI benchmark datasets clearly demonstrate the supremacy of our improved kNN algorithm over various existing versions such as i) kNN with fixed values of k (k= 1, 3, 5, 7, [ number of training points]) [1-3], ii) kNN with test point specific k [4], and, iii) kNN with hubness information [5].

show abstract

Section: Related Workmentioning

confidence: 99%

Test Point Specific k Estimation for kNN Classifier

Bhattacharya

Ghosh

Chowdhury

2014

2014 22nd International Conference on Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…Assessment of model's prediction error, tuning model's parameters, and model selection, e.g., [237] and [244]. In particular, it is often used to determine the number of neighbors in k-Nearest Neighbors models, e.g., [121] and [199]. In the next section we illustrate the use of Hold-out validation to determine a suitable matrix rank when solving least squares problems.…”

Section: The Training Set the Probe Set And Cross-validationmentioning

confidence: 99%

Imputing Missing Entries of a Data Matrix: A Review

Dax¹

2014

JAC

View full text Add to dashboard Cite

The problem of imputing missing entries of a data matrix is easy to state: Some entries of the matrix are unknown and we want to assign "appropriate values" to these entries. The need for solving such problems arises in several applications, ranging from traditional ields to modern ones. Typical examples of traditional ields are statistical analysis of incomplete survey data, business reports, operations management, psychometrika, meteorology and hydrology. Modern applications arise in machine learning, data mining, DNA microarrays data, chemometrics, computer vision, recommender systems and collaborative iltering. The problem is highly interesting and challenging. Many ingenious algorithms have been proposed, and there is vast literature on imputing techniques. Yet, most of the papers consider the imputing problem within the context of a speci ic application or a speci ic approach. The current survey provides a broad view of the problem, one that exposes the large variety of imputing methods. Old and new methods are examined with focus on the ideas behind the methods. It is illustrated how simple ideas evolve into highly sophisticated algorithms. The review enables us to identify a number of basic concepts and tools which are shared by several imputing methods. Equivalence theorems that we prove reveal surprising relations between apparently different methods.

show abstract