An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Arya, Sunil; Mount, David M.; Netanyahu, Nathan S.; Silverman, Ruth; Wu, Angela Y.

doi:10.1145/293347.293348

Cited by 2,094 publications

(1,738 citation statements)

References 58 publications

Supporting

Mentioning

1,724

Contrasting

Unclassified

Order By: Relevance

“…The algorithm incrementally selects pivotp that maximizes the sum of the lower bounds on distances between data objects [16] as shown in Eqs. (3) and (4). The distance lower bound is derived by the triangle inequality applied to the triangle with the pivot and two objects.…”

Section: Pivot Generation and Data Partitioning Algorithmsmentioning

confidence: 99%

“…From a perspective of resultant search accuracy, similarity search methods are classified into three main categories of exact, approximate, and heuristic search. Lately, to solve similarity search problems for large-scale data sets, approximate search methods that guarantee some resultant accuracy have been studied with considerable effort, which contain those based on a tree-type index [4] and locality-sensitive hashing (LSH) family [5]- [7]. In contrast, exact methods have received interest for a long time in the application domains where a data set has relatively low intrinsic dimensionality [3].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Similarity Search with a Pivot-Based Complete Binary Tree

Yamagishi

Aoyama²,

Saito

et al. 2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThis paper presents an efficient similarity search method utilizing as an index a complete binary tree (CBT) based on optimized pivots for a large-scale and high-dimensional data set. A similarity search method, in general, requires high-speed performance on both index construction off-line and similarity search itself online. To fulfill the requirement, we introduce novel techniques into an index construction and a similarity search algorithm in the proposed method for a range query. The index construction algorithm recursively employs the following two main functions, resulting in a CBT index. One is a pivot generation function that obtains one effective pivot at each node by efficiently maximizing a defined objective function. The other is a node bisection function that partitions a set of objects at a node into two almost equal-sized subsets based on the optimized pivot. The similarity search algorithm employs a three-stage process that narrows down candidate objects within a given range by pruning unnecessary branches and filtering objects in each stage. Experimental results on one million real image data set with high dimensionality demonstrate that the proposed method finds an exact solution for a range query at around one-quarter to half of the computational cost of one of the state-ofthe-art methods, by using a CBT index constructed off-line at a reasonable computational cost.

show abstract

Section: Pivot Generation and Data Partitioning Algorithmsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Efficient Similarity Search with a Pivot-Based Complete Binary Tree

Yamagishi

Aoyama²,

Saito

et al. 2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Each of the feature vectors extracted are then classified by the k-NN algorithm, which produces a hypothesis label for each pixel, using the fast approximate nearest neighbor search, based on kdtrees [34]. The result of the classification stage is a binary map representing tissue types.…”

Section: Fully-automated Classificationmentioning

confidence: 99%

Semi-automated and fully automated mammographic density measurement and breast cancer risk prediction

Llobet

Pollán

Antón

et al. 2014

Computer Methods and Programs in Biomedicine

View full text Add to dashboard Cite

Semi-automated and fully-automated mammographic density measurement and breast cancer risk prediction better tissue classification and density measurement on a continuous scale. The fully-automated method presented combines a classification scheme based on local features and thresholding operations that improve the performance of the classifier. A dataset of 655 mammograms was used to test the concordance of both approaches in measuring MD. Three expert radiologists measured MD in each of the mammograms using the semi-automated tool (DM-Scan). It was then measured by the fully-automated system and the correlation between both methods was computed. The relation between MD and breast cancer was then analyzed using a case-control dataset consisting of 230 mammograms.The Intraclass Correlation Coefficient (ICC) was used to compute reliability among raters and between techniques. The results obtained showed an average ICC=0.922 among raters when using the semi-automated tool, whilst the average correlation between the semi-automated and automated measures was ICC=0.838. In the case-control study, the results obtained showed Odds Ratios (OR) of 1.38 and 1.50 per 10% increase in MD when using the semi-automated and fully-automated approaches respectively. It can therefore be concluded that the automated and semi-automated MD assessment present a good correlation.Both methods also found an association between MD and breast cancer risk, which warrants the proposed tools for breast cancer risk prediction and clinical decision making. A full version of the DM-Scan is freely available.

show abstract

“…First, nearest neighbor search algorithms that are not computationally exhaustive degrade as a function of the dimension of the data. For example, the popular ApproximateNearest-Neighbor algorithm [1] computes a (1 + ε)-approximate nearest neighbor of a point in O((H 1+6H/ε) H log N) time. This approach is far too expensive for high-dimensional data where H = 28, 374 as it is in one of our test datasets in section V. Second, measurement noise can destabilize the topology of such graphs making the results sensitive to the parameters of the algorithm used to construct the graph.…”

Section: Graph Layout Algorithmsmentioning

confidence: 99%

Glimmer: Multilevel MDS on the GPU

Ingram

Munzner

Olano

2009

IEEE Trans. Visual. Comput. Graphics

163

View full text Add to dashboard Cite

Abstract-We present Glimmer, a new multilevel algorithm for multidimensional scaling designed to exploit modern graphics processing unit (GPU) hardware. We also present GPU-SF, a parallel, force-based subsystem used by Glimmer. Glimmer organizes input into a hierarchy of levels and recursively applies GPU-SF to combine and refine the levels. The multilevel nature of the algorithm makes local minima less likely while the GPU parallelism improves speed of computation. We propose a robust termination condition for GPU-SF based on a filtered approximation of the normalized stress function. We demonstrate the benefits of Glimmer in terms of speed, normalized stress, and visual quality against several previous algorithms for a range of synthetic and real benchmark datasets. We also show that the performance of Glimmer on GPUs is substantially faster than a CPU implementation of the same algorithm.

show abstract

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Cited by 2,094 publications

References 58 publications

Efficient Similarity Search with a Pivot-Based Complete Binary Tree

Efficient Similarity Search with a Pivot-Based Complete Binary Tree

Semi-automated and fully automated mammographic density measurement and breast cancer risk prediction

Glimmer: Multilevel MDS on the GPU

Contact Info

Product

Resources

About