Proceedings of the 3rd Workshop on Social Network Mining and Analysis 2009
DOI: 10.1145/1731011.1731012
|View full text |Cite
|
Sign up to set email alerts
|

Incremental all pairs similarity search for varying similarity thresholds

Abstract: All Pairs Similarity Search (AP SS) is a ubiquitous problem in many data mining applications and involves finding all pairs of records with similarity scores above a specified threshold. In this paper, we introduce the problem of Incremental All Pairs Similarity Search (IAP SS), where AP SS is performed multiple times over the same dataset by varying the similarity threshold. To the best of our knowledge, this is the first work that addresses the IAP SS problem. All existing solutions for AP SS perform redunda… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…The above studies focus on finding binary or non-binary pairs with some specific similarity measures above some given thresholds. Recently, Awekar et al [4] studied the problem of searching candidate pairs incrementally for varying similarity thresholds. Xiao et al [35] studied the top-K set similarity joins problem for near duplicate detection, which enumerated all the "necessary" similarity thresholds in the decreasing order until the top-K set had been found.…”
Section: Mining Interesting Patternsmentioning
confidence: 99%
See 4 more Smart Citations
“…The above studies focus on finding binary or non-binary pairs with some specific similarity measures above some given thresholds. Recently, Awekar et al [4] studied the problem of searching candidate pairs incrementally for varying similarity thresholds. Xiao et al [35] studied the top-K set similarity joins problem for near duplicate detection, which enumerated all the "necessary" similarity thresholds in the decreasing order until the top-K set had been found.…”
Section: Mining Interesting Patternsmentioning
confidence: 99%
“…In other words, in the initial stage, we push P [1,2] and P [2,3] (P [i, j] is the pair of item [i] and item [j], given i≤j) into the top-2 list, and compute their cosine values. Then, in the updating stage, we traverse along the diagonals (denoted by the dash-dotted line) in the sorted item-matrix to check in sequence whether P [3,4] , P [4,5] , P [5,6] , P [4,6] , P [3,5] ,…, P [1,6] can enter the top-2 list, as shown in Fig. 1.…”
Section: 222mentioning
confidence: 99%
See 3 more Smart Citations