2021
DOI: 10.1007/s00500-021-06178-2
|View full text |Cite
|
Sign up to set email alerts
|

Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification

Abstract: The k-nearest neighbor (kNN) rule is one of the best-known distance-based classifiers, and is usually associated with high performance and versatility as it requires only the definition of a dissimilarity measure. Nevertheless, kNN is also coupled with low-efficiency levels since, for each new query, the algorithm must carry out an exhaustive search of the training data, and this drawback is much more relevant when considering complex structural representations, such as graphs, trees or strings, owing to the c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…Not only was the RHC algorithm was found to be much faster than RSP3, but it also was one of the fastest approaches that took part in this experimental study [10]. A modified version of the RHC algorithm has recently been applied on string data spaces [11,12].…”
Section: Related Workmentioning
confidence: 95%
“…Not only was the RHC algorithm was found to be much faster than RSP3, but it also was one of the fastest approaches that took part in this experimental study [10]. A modified version of the RHC algorithm has recently been applied on string data spaces [11,12].…”
Section: Related Workmentioning
confidence: 95%
“…The considered multiclass PG strategies-the Chen method as well as the different RSP versions-constitute representative examples of the so-called space splitting policy [29], which typically follows a two-step approach: a first stage, space partitioning, divides the feature space of the multiclass set T mc into different regions using certain heuristics; after that, the prototype merging stage computes new prototypes from each region attending to different criteria, producing the reduced set R mc . The existing PG strategies under this framework, therefore, essentially differ in the particular splitting and prototype generation heuristics considered.…”
Section: Reference Multiclass Pgmentioning
confidence: 99%
“…In [21] randomized trees, support vector machines and random forests were used for string similarity evaluation and increased accuracy was obtained on a large dataset as compared with classic approaches such as Jaro-Winkler and Damerau-Levenshtein approaches. Moreover, in [27] an unsupervised machine learning approach is used for data reduction in string space.…”
Section: String Similarities In Large Datasetsmentioning
confidence: 99%