Proceedings of the 5th Asia-Pacific Symposium on Internetware 2013
DOI: 10.1145/2532443.2532446
|View full text |Cite
|
Sign up to set email alerts
|

b-bit minwise hashing in practice

Abstract: Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26,32] demonstrated a potential use of b-bit minwise hashing [23,24] for efficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical issues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 32 publications
0
5
0
Order By: Relevance
“…SimHash generates a single bit output (only the signs) whereas MinHash generates an integer value. Recently proposed b-bit minwise hashing [22] provides simple strategy to generate an informative single bit output from MinHash, by using the parity of MinHash values:…”
Section: -Bit Minwise Hashingmentioning
confidence: 99%
See 1 more Smart Citation
“…SimHash generates a single bit output (only the signs) whereas MinHash generates an integer value. Recently proposed b-bit minwise hashing [22] provides simple strategy to generate an informative single bit output from MinHash, by using the parity of MinHash values:…”
Section: -Bit Minwise Hashingmentioning
confidence: 99%
“…For example, the paper on Conditional Random Sampling (CRS) [19] showed that random projections can be very inaccurate especially in binary data, for the task of inner product estimation (which is not the same as near neighbor search). A more recent paper [26] empirically demonstrated that b-bit minwise hashing [22] outperformed SimHash and spectral hashing [30].…”
Section: Introductionmentioning
confidence: 99%
“…In practice, one would not store the entire matrix of signs nor all the random permutations. In an implementation, hash functions [Carter and Wegman, 1979] would be used to create the matrix S deterministically, though it is beyond the scope of this paper to go into the details; see Li et al [2013] for more information and further computational improvements. With this approach, S would be created row-by-row, and only a single observation from X would need to be kept in memory at any one time.…”
Section: Construction Of S With B-bit Min-wise Hashing and Binary Var...mentioning
confidence: 99%
“…The empirical performance of regression and classification procedures following b-bit min-wise hashing [Li et al, , 2013 is particularly impressive. Existing theory on b-bit min-wise hashing has focused on the variance and bias in the approximation of the kernel.…”
Section: Introductionmentioning
confidence: 99%
“…Many methods of document representation based on TF-IDF can construct Vector Space Model (VSM) of text corpus. Similarly, many methods of document representation exploit statistical term measures, such as BoS (Bag-of-Words) [3] and Minwise hashing [4]. For document representation, these methods are perceived as statistical methods of feature extraction.…”
Section: Introductionmentioning
confidence: 99%