Proceedings 17th International Conference on Data Engineering
DOI: 10.1109/icde.2001.914854
|View full text |Cite
|
Sign up to set email alerts
|

A cost model and index architecture for the similarity join

Abstract: The similarity join is an important database

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
61
0
3

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 39 publications
(64 citation statements)
references
References 36 publications
0
61
0
3
Order By: Relevance
“…For similarity joins on high-dimensional point datasets, the most representative papers are [18,14,10,4]. In [18] an index structure (ε-kdB tree) and an algorithm for similarity self-join on high-dimensional points was presented.…”
Section: Related Work and Motivationmentioning
confidence: 99%
See 2 more Smart Citations
“…For similarity joins on high-dimensional point datasets, the most representative papers are [18,14,10,4]. In [18] an index structure (ε-kdB tree) and an algorithm for similarity self-join on high-dimensional points was presented.…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…In [10] a new algorithm (Generic External Space Sweep, GESS), which introduces a rate of data replication to reduce the number of distance computations as an enhancement of MSJ, was proposed. In [4], a complex and interesting index architecture (Multipage Index, MuX) and join algorithm (MuXjoin), which allows a separate optimization CPU time and I/O time, were presented. On the other hand, the K-CPQ has not been studied in-depth for high-dimensionality data.…”
Section: Related Work and Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…When a pair of directory pages (P R , P S ) is under consideration, the algorithm forms all pairs of the child pages of P R and P S having distances Recently, index based similarity join methods have been analyzed from a theoretical point of view. [7] proposes a cost model based on the concept of the Minkowski sum [5] which can be used for optimizations such as page size optimization. The analysis reveals a serious optimization conflict between CPU and I/O time.…”
Section: Distance Range Based Similarity Joinmentioning
confidence: 99%
“…Our k-NN similarity join algorithm used the Multipage Index [7] which allows a separate optimization of CPU and I/O performance. The competitive technique, the evaluation on top of single similarity queries, was also supported by the same index structure which is traversed using a variation of the nearest neighbor algorithm by Hjaltason and Samet [14] which has been shown to yield an optimal number of page accesses.…”
Section: Experimental Evaluationmentioning
confidence: 99%