Proceedings of the 5th Workshop on Representation Learning for NLP 2020
DOI: 10.18653/v1/2020.repl4nlp-1.19
|View full text |Cite
|
Sign up to set email alerts
|

On Dimensional Linguistic Properties of the Word Embedding Space

Abstract: Word embeddings have become a staple of several natural language processing tasks, yet much remains to be understood about their properties. In this work, we analyze word embeddings in terms of their principal components and arrive at a number of novel and counterintuitive observations. In particular, we characterize the utility of variance explained by the principal components as a proxy for downstream performance. Furthermore, through syntactic probing of the principal embedding space, we show that the synta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 18 publications
1
6
0
Order By: Relevance
“…However, the frequency of a word is an important piece of information for tasks that require differentiating stop words and content words such as in information retrieval. Raunak et al (2020) demonstrated that removing the top principal components does not necessarily lead to performance improvement. Moreover, contextualised word embeddings such as BERT (Devlin et al, 2019) and Elmo (Peters et al, 2018) have shown to be anisotropic despite their superior performance in a wide-range of NLP tasks (Ethayarajh, 2019).…”
Section: Autoencoding As Centering and Pca Projectionmentioning
confidence: 99%
“…However, the frequency of a word is an important piece of information for tasks that require differentiating stop words and content words such as in information retrieval. Raunak et al (2020) demonstrated that removing the top principal components does not necessarily lead to performance improvement. Moreover, contextualised word embeddings such as BERT (Devlin et al, 2019) and Elmo (Peters et al, 2018) have shown to be anisotropic despite their superior performance in a wide-range of NLP tasks (Ethayarajh, 2019).…”
Section: Autoencoding As Centering and Pca Projectionmentioning
confidence: 99%
“…Alternatively, using self-supervised trained checkpoints of these models and their pre-trained embeddings as a starting point to be later fine-tuned for supervised downstream tasks is widely used. Unlike previous works in the literature that have only focused on reduced pre-trained embeddings [25,26,27], in this work we are interested of the evaluation of the impact of dimensionality reduction on both types of embeddings, pre-trained and downstream fine-tuned embeddings.…”
Section: Dimensional Reduction Of Embeddingsmentioning
confidence: 99%
“…Recently, the study of Raunak et al [25,26] has shed more light on the importance of reducing the size of embeddings produced by Machine Learning and Deep Learning models. More specifically, these authors draw attention to reducing the size of classical GloVe [32] and FastText [33] pre-trained word embeddings using PCA-based post-processing algorithms, achieving similar or even better performance than the original embeddings.…”
Section: Dimensional Reduction Of Embeddingsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the frequency of a word is an important piece of information for tasks that require differentiating stop words and content words such as in information retrieval. Actually, Raunak et al (2020) demonstrated that removing the top principal components does not necessarily lead to performance improvement. Moreover, contextualised word embeddings such as BERT (Devlin et al, 2019) and Elmo (Peters et al, 2018) have shown to be anisotropic despite their superior performance in a wide-range of NLP tasks (Ethayarajh, 2019).…”
Section: Autoencoding As Centering and Pca Projectionmentioning
confidence: 99%