Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere 2015
DOI: 10.3115/v1/p15-2002
|View full text |Cite
|
Sign up to set email alerts
|

On metric embedding for boosting semantic similarity computations

Abstract: Computing pairwise word semantic similarity is widely used and serves as a building block in many tasks in NLP. In this paper, we explore the embedding of the shortest-path metrics from a knowledge base (Wordnet) into the Hamming hypercube, in order to enhance the computation performance. We show that, although an isometric embedding is untractable, it is possible to achieve good non-isometric embeddings. We report a speedup of three orders of magnitude for the task of computing Leacock and Chodorow (LCH) simi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…To take advantage of fast CPU optimized bitwise operations, the size of binary vectors has to be in adequacy with register sizes (64, 128 or 256 bits). When this criteria is met, the computations are much faster (Norouzi, Punjani, and Fleet 2012;Subercaze, Gravier, and Laforest 2015). Nevertheless, mapping words to binary codes is not enough as the vectors are then used in NLP applications.…”
Section: Introductionmentioning
confidence: 99%
“…To take advantage of fast CPU optimized bitwise operations, the size of binary vectors has to be in adequacy with register sizes (64, 128 or 256 bits). When this criteria is met, the computations are much faster (Norouzi, Punjani, and Fleet 2012;Subercaze, Gravier, and Laforest 2015). Nevertheless, mapping words to binary codes is not enough as the vectors are then used in NLP applications.…”
Section: Introductionmentioning
confidence: 99%
“…Baselines path2vec is compared against five baselines (more on them in Section 2): raw Word-Net similarities by respective measures; Deepwalk (Perozzi et al, 2014); node2vec (Grover and Leskovec, 2016); FSE (Subercaze et al, 2015); and TransR (Lin et al, 2015).…”
Section: Experiments 1: Intrinsic Evaluation Based On Semantic Similaritymentioning
confidence: 99%
“…Existing approaches to graph embeddings use either factorization of the graph adjacency matrix (Cao et al, 2015;Ou et al, 2016) or random walks over the graph as in Deepwalk (Perozzi et al, 2014) and node2vec (Grover and Leskovec, 2016). A different approach is taken by Subercaze et al (2015), who directly embed the WordNet tree graph into Hamming hypercube binary representations. Their 'Fast similarity embedding' (FSE) model provides a quick way of calculating semantic similarities based on WordNet.…”
Section: Related Workmentioning
confidence: 99%
“…Discussion of Results Figure 1 presents computation times for pairwise similarities between one synset and all other 82,115 WordNet noun synsets. We compare running times of calculating two original graph-based metrics to Hamming distance between 128D FSE binary embeddings (Subercaze et al, 2015) and to dot product between their dense vectorized 300D counterparts (using CPU). Using float vectors (path2vec) is 4 orders of magnitude faster than operating directly on graphs, and 2 orders faster than Hamming distance.…”
Section: Computational Efficiencymentioning
confidence: 99%
See 1 more Smart Citation