Proceedings of the 24th International Conference on World Wide Web 2015
DOI: 10.1145/2736277.2741285
|View full text |Cite
|
Sign up to set email alerts
|

Asymmetric Minwise Hashing for Indexing Binary Inner Products and Set Containment

Abstract: Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to provide a solutio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
86
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 69 publications
(86 citation statements)
references
References 25 publications
0
86
0
Order By: Relevance
“…As example, weighted minwise hashing was successfully applied to train generalized min-max kernel support vector machines [18,22]. Furthermore, asymmetric locality-sensitive hashing [33], which can be realized using weighted minwise hashing, was used for e cient deep learning [34]. Finally, it could also be applied to random forests that are constructed using the weighted Jaccard index as similarity measure [30].…”
Section: Applicationsmentioning
confidence: 99%
“…As example, weighted minwise hashing was successfully applied to train generalized min-max kernel support vector machines [18,22]. Furthermore, asymmetric locality-sensitive hashing [33], which can be realized using weighted minwise hashing, was used for e cient deep learning [34]. Finally, it could also be applied to random forests that are constructed using the weighted Jaccard index as similarity measure [30].…”
Section: Applicationsmentioning
confidence: 99%
“…WHIMP gets a precision and recall above 0.7 for at least 75% of the sample. We stress the low values of cosine similarities here: a similarity of 0.2 is well-below the values studied in recent LSH-based results [37,39,38]. It is well-known that low similarity values are harder to detect, yet WHIMP gets accurate results for an overwhelming majority of the vertices/users.…”
Section: Resultsmentioning
confidence: 67%
“…Compared with symmetric similarity such as Jaccard similarity, containment similarity gives special consideration on the query size, which makes it more suitable in some applications. As shown in [35], containment similarity is useful in record matching application. Given two text descriptions of two restaurants X and Y which are represented by two "set of words" records: {five, guys, burgers, and, fries, downtown, brooklyn, new, york} and {five, kitchen, berkeley} respectively.…”
Section: Introductionmentioning
confidence: 99%
“…Challenges. The problem of containment similarity search has been intensively studied in the literature in recent years (e.g., [5], [35], [44]). The key challenges of this problem come from the following three aspects: (i) The number of elements (i.e., vocabulary size) may be very large.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation