Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1308
|View full text |Cite
|
Sign up to set email alerts
|

The strange geometry of skip-gram with negative sampling

Abstract: Despite their ubiquity, word embeddings trained with skip-gram negative sampling (SGNS) remain poorly understood. We find that vector positions are not simply determined by semantic similarity, but rather occupy a narrow cone, diametrically opposed to the context vectors. We show that this geometric concentration depends on the ratio of positive to negative examples, and that it is neither theoretically nor empirically inherent in related embedding algorithms.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
83
0
2

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 88 publications
(91 citation statements)
references
References 5 publications
6
83
0
2
Order By: Relevance
“…This is different than GloVe, where stability remains reasonably constant across word frequencies, as shown in Figure 2. The behavior we see here agrees with the conclusion of (Mimno and Thompson, 2017), who find that GloVe exhibits more well-behaved geometry than word2vec.…”
Section: Lessons Learned: What Contributes To the Stability Of An Embsupporting
confidence: 92%
“…This is different than GloVe, where stability remains reasonably constant across word frequencies, as shown in Figure 2. The behavior we see here agrees with the conclusion of (Mimno and Thompson, 2017), who find that GloVe exhibits more well-behaved geometry than word2vec.…”
Section: Lessons Learned: What Contributes To the Stability Of An Embsupporting
confidence: 92%
“…Our work in this paper is thus markedly different from most dissections of contextualized representations. It is more similar to Mimno and Thompson (2017), which studied the geometry of static word embedding spaces.…”
Section: Related Workmentioning
confidence: 99%
“…We take the absolute value of each term because the embedding model may make a word more gendered, but in the direction opposite of what is implied in the corpus. λ ← 1 because we expect λ ≈ 1 in practice (Ethayarajh et al, 2018;Mimno and Thompson, 2017). Similarly, α ← −1 because it minimizes the difference between x − y and its information theoretic interpretation over the gender-defining word pairs in S, though this is an estimate and may differ from the true value of α.…”
Section: Breaking Down Gender Associationmentioning
confidence: 99%