Proceedings of the 22nd Conference on Computational Natural Language Learning 2018
DOI: 10.18653/v1/k18-1028
|View full text |Cite
|
Sign up to set email alerts
|

Uncovering Divergent Linguistic Information in Word Embeddings with Lessons for Intrinsic and Extrinsic Evaluation

Abstract: Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness. In this paper, we show that each embedding model captures more information than directly apparent. A linear transformation that adjusts the similarity order of the model without any external resource can tailor it to achieve better results in th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
31
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(35 citation statements)
references
References 24 publications
4
31
0
Order By: Relevance
“…Based on the distributional hypothesis (i.e., "a word is characterized by the company it keeps" (Harris, 1954)), word embedding methods aim to learn the distributed representations by analyzing their contexts (Mikolov et al, 2013). Recent work shows that word embedding could uncover textual information of various levels (Artetxe et al, 2018). Hence, we leverage word embedding as a part of the word representation.…”
Section: Word Embeddingmentioning
confidence: 99%
“…Based on the distributional hypothesis (i.e., "a word is characterized by the company it keeps" (Harris, 1954)), word embedding methods aim to learn the distributed representations by analyzing their contexts (Mikolov et al, 2013). Recent work shows that word embedding could uncover textual information of various levels (Artetxe et al, 2018). Hence, we leverage word embedding as a part of the word representation.…”
Section: Word Embeddingmentioning
confidence: 99%
“…Based on the distributional hypothesis (i.e., "a word is characterized by the company it keeps" (Harris, 1954)), embedding methods represent each word as a dense vector, while preserving their syntactic and semantic information in a context-agnostic manner (Mikolov et al, 2013;Pennington et al, 2014). Recent work shows that word embeddings can cover textual information of various levels (Artetxe et al, 2018) and improve name tagging performance significantly (Cherry and Guo, 2015). Still, due to the long-tail distri-bution of word frequency, embedding vectors usually have inconsistent reliability, and such inconsistency has been long overlooked.…”
Section: Word Representation Modelsmentioning
confidence: 99%
“…In the next two sections, we present our extensions to these results. Artetxe et al (2018) propose a post-processing vector transformation technique based on eigendecomposition that corresponds to calculating first, second, nth-order similarities. The basic intuition is that, for example, a second-order similarity is a similarity matrix of similarities.…”
Section: The Human Experimentsmentioning
confidence: 99%
“…Specifically, word embeddings are a merger of the many levels of representation that we find in human languages: lexical, morphological, syntactic, semantic. It has been argued that post-processing transformations can tease apart syntactic aspects of distributed representations from semantic aspects (Artetxe et al, 2018).…”
Section: Introductionmentioning
confidence: 99%