On metric embedding for boosting semantic similarity computations

Subercaze, Julien; Gravier, Christophe; Laforest, Frédérique

doi:10.3115/v1/p15-2002

Cited by 5 publications

(9 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To take advantage of fast CPU optimized bitwise operations, the size of binary vectors has to be in adequacy with register sizes (64, 128 or 256 bits). When this criteria is met, the computations are much faster (Norouzi, Punjani, and Fleet 2012;Subercaze, Gravier, and Laforest 2015). Nevertheless, mapping words to binary codes is not enough as the vectors are then used in NLP applications.…”

Section: Introductionmentioning

confidence: 99%

Near-Lossless Binarization of Word Embeddings

Tissier

Gravier

Habrard

2019

AAAI

Self Cite

View full text Add to dashboard Cite

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of memory and calculations which makes them unsuitable for use on low-resource devices. The method proposed in this paper transforms real-valued embeddings into binary embeddings while preserving semantic information, requiring only 128 or 256 bits for each vector. This leads to a small memory footprint and fast vector operations. The model is based on an autoencoder architecture, which also allows to reconstruct original vectors from the binary ones. Experimental results on semantic similarity, text classification and sentiment analysis tasks show that the binarization of word embeddings only leads to a loss of ∼2% in accuracy while vector size is reduced by 97%. Furthermore, a top-k benchmark demonstrates that using these binary vectors is 30 times faster than using real-valued vectors.

show abstract

Section: Introductionmentioning

confidence: 99%

Near-Lossless Binarization of Word Embeddings

Tissier

Gravier

Habrard

2019

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…Baselines path2vec is compared against five baselines (more on them in Section 2): raw Word-Net similarities by respective measures; Deepwalk (Perozzi et al, 2014); node2vec (Grover and Leskovec, 2016); FSE (Subercaze et al, 2015); and TransR (Lin et al, 2015).…”

Section: Experiments 1: Intrinsic Evaluation Based On Semantic Similaritymentioning

confidence: 99%

Learning Graph Embeddings from WordNet-based Similarity Measures

Kutuzov¹,

Dorgham²,

Oliynyk³

et al. 2019

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

View full text Add to dashboard Cite

We present path2vec, a new approach for learning graph embeddings that relies on structural measures of pairwise node similarities. The model learns representations for nodes in a dense space that approximate a given userdefined graph distance measure, such as e.g. the shortest path distance or distance measures that take information beyond the graph structure into account. Evaluation of the proposed model on semantic similarity and word sense disambiguation tasks, using various WordNetbased similarity measures, show that our approach yields competitive results, outperforming strong graph embedding baselines. The model is computationally efficient, being orders of magnitude faster than the direct computation of graph-based distances.

show abstract

“…Existing approaches to graph embeddings use either factorization of the graph adjacency matrix (Cao et al, 2015;Ou et al, 2016) or random walks over the graph as in Deepwalk (Perozzi et al, 2014) and node2vec (Grover and Leskovec, 2016). A different approach is taken by Subercaze et al (2015), who directly embed the WordNet tree graph into Hamming hypercube binary representations. Their 'Fast similarity embedding' (FSE) model provides a quick way of calculating semantic similarities based on WordNet.…”

Section: Related Workmentioning

confidence: 99%

“…Discussion of Results Figure 1 presents computation times for pairwise similarities between one synset and all other 82,115 WordNet noun synsets. We compare running times of calculating two original graph-based metrics to Hamming distance between 128D FSE binary embeddings (Subercaze et al, 2015) and to dot product between their dense vectorized 300D counterparts (using CPU). Using float vectors (path2vec) is 4 orders of magnitude faster than operating directly on graphs, and 2 orders faster than Hamming distance.…”

Section: Computational Efficiencymentioning

confidence: 99%

See 1 more Smart Citation

Making Fast Graph-based Algorithms with Graph Metric Embeddings

Kutuzov

Dorgham

Oliynyk

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

The computation of distance measures between nodes in graphs is inefficient and does not scale to large graphs. We explore dense vector representations as an effective way to approximate the same information: we introduce a simple yet efficient and effective approach for learning graph embeddings. Instead of directly operating on the graph structure, our method takes structural measures of pairwise node similarities into account and learns dense node representations reflecting user-defined graph distance measures, such as e.g. the shortest path distance or distance measures that take information beyond the graph structure into account. We demonstrate a speed-up of several orders of magnitude when predicting word similarity by vector operations on our embeddings as opposed to directly computing the respective path-based measures, while outperforming various other graph embeddings on semantic similarity and word sense disambiguation tasks and show evaluations on the WordNet graph and two knowledge base graphs.

show abstract

On metric embedding for boosting semantic similarity computations

Cited by 5 publications

References 21 publications

Near-Lossless Binarization of Word Embeddings

Near-Lossless Binarization of Word Embeddings

Learning Graph Embeddings from WordNet-based Similarity Measures

Making Fast Graph-based Algorithms with Graph Metric Embeddings

Contact Info

Product

Resources

About