The strange geometry of skip-gram with negative sampling

Mimno, David; Thompson, Laure

doi:10.18653/v1/d17-1308

Cited by 88 publications

(91 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This is different than GloVe, where stability remains reasonably constant across word frequencies, as shown in Figure 2. The behavior we see here agrees with the conclusion of (Mimno and Thompson, 2017), who find that GloVe exhibits more well-behaved geometry than word2vec.…”

Section: Lessons Learned: What Contributes To the Stability Of An Embsupporting

confidence: 92%

Factors Influencing the Surprising Instability of Word Embeddings

Wendlandt¹,

Kummerfeld²,

Mihalcea³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

101

View full text Add to dashboard Cite

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.

show abstract

Section: Lessons Learned: What Contributes To the Stability Of An Embsupporting

confidence: 92%

Factors Influencing the Surprising Instability of Word Embeddings

Wendlandt¹,

Kummerfeld²,

Mihalcea³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

101

View full text Add to dashboard Cite

show abstract

“…Our work in this paper is thus markedly different from most dissections of contextualized representations. It is more similar to Mimno and Thompson (2017), which studied the geometry of static word embedding spaces.…”

Section: Related Workmentioning

confidence: 99%

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Ethayarajh

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

582

386

View full text Add to dashboard Cite

Replacing static word embeddings with contextualized word representations has yielded significant improvements on many NLP tasks. However, just how contextual are the contextualized representations produced by models such as ELMo and BERT? Are there infinitely many context-specific representations for each word, or are words essentially assigned one of a finite number of word-sense representations? For one, we find that the contextualized representations of all words are not isotropic in any layer of the contextualizing model. While representations of the same word in different contexts still have a greater cosine similarity than those of two different words, this self-similarity is much lower in upper layers. This suggests that upper layers of contextualizing models produce more context-specific representations, much like how upper layers of LSTMs produce more task-specific representations. In all layers of ELMo, BERT, and GPT-2, on average, less than 5% of the variance in a word's contextualized representations can be explained by a static embedding for that word, providing some justification for the success of contextualized representations.

show abstract

“…We take the absolute value of each term because the embedding model may make a word more gendered, but in the direction opposite of what is implied in the corpus. λ ← 1 because we expect λ ≈ 1 in practice (Ethayarajh et al, 2018;Mimno and Thompson, 2017). Similarly, α ← −1 because it minimizes the difference between x − y and its information theoretic interpretation over the gender-defining word pairs in S, though this is an estimate and may differ from the true value of α.…”

Section: Breaking Down Gender Associationmentioning

confidence: 99%

Understanding Undesirable Word Embedding Associations

Ethayarajh¹,

Duvenaud²,

Hirst³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Word embeddings are often criticized for capturing undesirable word associations such as gender stereotypes. However, methods for measuring and removing such biases remain poorly understood. We show that for any embedding model that implicitly does matrix factorization, debiasing vectors post hoc using subspace projection (Bolukbasi et al., 2016) is, under certain conditions, equivalent to training on an unbiased corpus. We also prove that WEAT, the most common association test for word embeddings, systematically overestimates bias. Given that the subspace projection method is provably effective, we use it to derive a new measure of association called the relational inner product association (RIPA). Experiments with RIPA reveal that, on average, skipgram with negative sampling (SGNS) does not make most words any more gendered than they are in the training corpus. However, for gender-stereotyped words, SGNS actually amplifies the gender association in the corpus.

show abstract

The strange geometry of skip-gram with negative sampling

Cited by 88 publications

References 5 publications

Factors Influencing the Surprising Instability of Word Embeddings

Factors Influencing the Surprising Instability of Word Embeddings

How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Understanding Undesirable Word Embedding Associations

Contact Info

Product

Resources

About