Proceedings of the 16th International Conference on World Wide Web 2007
DOI: 10.1145/1242572.1242591
|View full text |Cite
|
Sign up to set email alerts
|

Scaling up all pairs similarity search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
697
0
12

Year Published

2008
2008
2017
2017

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 568 publications
(710 citation statements)
references
References 20 publications
1
697
0
12
Order By: Relevance
“…4) and token sim (Eq. 5) that use different string similarity measures; we also compare to AllPairs [2], PP-Join(+) [20] and Ed-Join [19]; lastly, we compare to Naive [15] that detects owl:sameAs links without candidate selection. Since Ed-Join is not compatible with our Sun machine, we run it on a Linux machine (dual-core 2GHz processor and 4GB memory), and estimate its runtime on the Sun machine by examining runtime difference of bigram on the two machines.…”
Section: Evaluation Results On Rdf Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…4) and token sim (Eq. 5) that use different string similarity measures; we also compare to AllPairs [2], PP-Join(+) [20] and Ed-Join [19]; lastly, we compare to Naive [15] that detects owl:sameAs links without candidate selection. Since Ed-Join is not compatible with our Sun machine, we run it on a Linux machine (dual-core 2GHz processor and 4GB memory), and estimate its runtime on the Sun machine by examining runtime difference of bigram on the two machines.…”
Section: Evaluation Results On Rdf Datasetsmentioning
confidence: 99%
“…All-Pairs [2], PP-Join(+) [20] and Ed-Join [19] are all inverted index based approaches. All-Pairs is a simple index based algorithm with certain optimization strategies.…”
Section: Related Workmentioning
confidence: 99%
“…FastJoin [22] adopts fuzzy matching techniques that consider both token and character level similarity. Similar algorithms also include AllPairs [2] and IndexChunk [14]. Although our proposed candidate selection algorithm also adopts indexing techniques, a secondary filtering on the looked-up candidates from the index significantly reduces the size of the final candidate set.…”
Section: Related Workmentioning
confidence: 99%
“…The indexing scheme we developed is inspired by redundant indexing methods such as LSH [4], RBV [7], OMEDRANK [3] or PvS [5], and by proposals addressing similarity joins, like [1] and [2]. We divide the database of keyframe signatures into segments (or buckets) such that, in each segment, the similarity between any two signatures is above a threshold; the search for similar keyframes is then only performed within each bucket.…”
Section: Keyframe Indexing For Off-line or Online Miningmentioning
confidence: 99%