2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2016
DOI: 10.1109/ipdps.2016.57
|View full text |Cite
|
Sign up to set email alerts
|

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Abstract: Abstract-Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(log n) algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and h… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(15 citation statements)
references
References 12 publications
0
15
0
Order By: Relevance
“…These edges may be considered (by taking d𝒜 to always be the distance between aggregates) but doing so is computationally infeasible for large graphs. One potential solution to this problem would be to use efficient k ‐nearest neighbor methods (e.g., Reference 15) to locate geometrically close unconnected aggregates and use such information to ensure that their balls will not overlap.…”
Section: The Two‐level Coarse‐to‐fine Proceduresmentioning
confidence: 99%
“…These edges may be considered (by taking d𝒜 to always be the distance between aggregates) but doing so is computationally infeasible for large graphs. One potential solution to this problem would be to use efficient k ‐nearest neighbor methods (e.g., Reference 15) to locate geometrically close unconnected aggregates and use such information to ensure that their balls will not overlap.…”
Section: The Two‐level Coarse‐to‐fine Proceduresmentioning
confidence: 99%
“…A typical implementation of nearest traversal uses a priority queue based on distances, using the closest node in each iteration. An alternative and better performing approach, first derived for k-d trees in Patwary et al [2016], is to use a stack. As stack is a Last-In-First-Out data structure, it is possible to get a behavior similar to the one of a priority queue by adding a child with a shorter distance second (so that it sits on top of the stack).…”
Section: Traversal For Nearestmentioning
confidence: 99%
“…The source code for the recommendation system is available on GitHub. 2 The proposed algorithm relies only on minimum Euclidean distances, which can be computed efficiently using distributed tree structures [16,17] or gpu-based implementations [18]. The proposed system is thus horizontally scalable and suitable for distributed applications.…”
Section: Recommendation Algorithmmentioning
confidence: 99%
“…Users explore different paths in the space by "liking" or "skipping" tracks. As both the mapping and the search processes are amenable to distributed tree [16,17] and gpu-based [18] parallel searches, the proposed system could be scaled up to accommodate increasing music collections and user bases.…”
Section: Introductionmentioning
confidence: 99%