2018
DOI: 10.48550/arxiv.1805.08539
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fully Understanding the Hashing Trick

Casper Benjamin Freksen,
Lior Kamma,
Kasper Green Larsen

Abstract: Feature hashing, also known as the hashing trick, introduced by Weinberger et al. (2009), is one of the key techniques used in scaling-up machine learning algorithms. Loosely speaking, feature hashing uses a random sparse projection matrix A : R n → R m (where m n) in order to reduce the dimension of the data from n to m while approximately preserving the Euclidean norm. Every column of A contains exactly one non-zero entry, equals to either −1 or 1.Weinberger et al. showed tail bounds on Ax 2 2 . Specifically… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…As hash function h the 32-bit version of the MurmurHash3 algorithm [1], a popular noncryptographic hash function, is used. It can be proven that under moderate assumptions feature hashing approximately conserves the Euclidean norm [10], and hence, the cosine similarity between hashed vectors can be used to approximate the similarity between the original, highdimensional vectors and spectra.…”
Section: Feature Hashing To Convert High-resolution Spectra To Low-di...mentioning
confidence: 99%
“…As hash function h the 32-bit version of the MurmurHash3 algorithm [1], a popular noncryptographic hash function, is used. It can be proven that under moderate assumptions feature hashing approximately conserves the Euclidean norm [10], and hence, the cosine similarity between hashed vectors can be used to approximate the similarity between the original, highdimensional vectors and spectra.…”
Section: Feature Hashing To Convert High-resolution Spectra To Low-di...mentioning
confidence: 99%
“…It can be proven that under moderate assumptions feature hashing approximately conserves the Euclidean norm, 19 and hence, the similarity between hashed vectors can be used to approximate the similarity between the original, high-dimensional vectors. An important consideration in choosing hash function h is that it must be unbiased in order to minimize the number of hash collisions.…”
Section: Feature Hashing To Vectorize High-resolution Mass Spectramentioning
confidence: 99%