2019
DOI: 10.1016/j.jpdc.2017.11.016
|View full text |Cite
|
Sign up to set email alerts
|

Parallel cosine nearest neighbor graph construction

Abstract: The nearest neighbor graph is an important structure in many data mining methods for clustering, advertising, recommender systems, and outlier detection. Constructing the graph requires computing up to n 2 similarities for a set of n objects. This high complexity has led researchers to seek approximate methods, which find many but not all of the nearest neighbors. In contrast, we leverage shared memory parallelism and recent advances in similarity joins to solve the problem exactly. Our method considers all pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…We remark, however, that they typically involve onerous and rather complex preprocessing steps, which may not be suitable for a large-scale data. Approximate NN (ANN) search algorithms (Indyk & Motwani, 1998;Slaney & Casey, 2008;Har-Peled et al, 2012) are yet another practical solution to reduce the query complexity, but ANN-search-based rules such as (Alabduljalil et al, 2013;Anastasiu & Karypis, 2019) hardly have any statistical guarantee (Dasgupta & Kpotufe, 2019) with few exception (Gottlieb et al, 2014;Efremenko et al, 2020). Gottlieb et al (2014) proposed an ANN-based classifier for general doubling spaces with generalization bounds.…”
Section: Related Workmentioning
confidence: 99%
“…We remark, however, that they typically involve onerous and rather complex preprocessing steps, which may not be suitable for a large-scale data. Approximate NN (ANN) search algorithms (Indyk & Motwani, 1998;Slaney & Casey, 2008;Har-Peled et al, 2012) are yet another practical solution to reduce the query complexity, but ANN-search-based rules such as (Alabduljalil et al, 2013;Anastasiu & Karypis, 2019) hardly have any statistical guarantee (Dasgupta & Kpotufe, 2019) with few exception (Gottlieb et al, 2014;Efremenko et al, 2020). Gottlieb et al (2014) proposed an ANN-based classifier for general doubling spaces with generalization bounds.…”
Section: Related Workmentioning
confidence: 99%
“…As the definition of "large-scale" evolves, it would be good to know if the divide-andconquer scheme described above may be used in conjunction to these methods. There are related methods such as those that rely on efficient kNNs or approximate nearest neighbor (ANN) methods [2,1,26,39]. However, little theoretical understanding in terms of the classification accuracy of these approximate methods has been obtained (with rare exceptions like [21].)…”
Section: Introductionmentioning
confidence: 99%