Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017) 2017
DOI: 10.18653/v1/s17-1001
|View full text |Cite
|
Sign up to set email alerts
|

What Analogies Reveal about Word Vectors and their Compositionality

Abstract: Analogy completion via vector arithmetic has become a common means of demonstrating the compositionality of word embeddings. Previous work have shown that this strategy works more reliably for certain types of analogical word relationships than for others, but these studies have not offered a convincing account for why this is the case. We arrive at such an account through an experiment that targets a wide variety of analogy questions and defines a baseline condition to more accurately measure the efficacy of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(21 citation statements)
references
References 18 publications
0
21
0
Order By: Relevance
“…country:capital, male:female). We obtained these sets from the distribution described in Finley et al (2017) 6 , in which only the first correct answer to questions with multiple correct answers in BATS is retained, and used a parallelized implementation of the widely used vector offset method, in which for a given proportional analogy a:b:c:d, all word vectors in the space are rank-ordered in accordance with their cosine similarity to the vector − −−−−− → c + b − a. We report average accuracy, where a result is considered accurate if d is the top-ranked result aside from a, b and c. 7 To evaluate the effects of encoding word order on the relative distance between terms, we used a series of widely used reference sets that mediate comparison between human and machine estimates of pairwise similarity and relatedness between term pairs.…”
Section: Discussionmentioning
confidence: 99%
“…country:capital, male:female). We obtained these sets from the distribution described in Finley et al (2017) 6 , in which only the first correct answer to questions with multiple correct answers in BATS is retained, and used a parallelized implementation of the widely used vector offset method, in which for a given proportional analogy a:b:c:d, all word vectors in the space are rank-ordered in accordance with their cosine similarity to the vector − −−−−− → c + b − a. We report average accuracy, where a result is considered accurate if d is the top-ranked result aside from a, b and c. 7 To evaluate the effects of encoding word order on the relative distance between terms, we used a series of widely used reference sets that mediate comparison between human and machine estimates of pairwise similarity and relatedness between term pairs.…”
Section: Discussionmentioning
confidence: 99%
“…Word embeddings provide a mathematical model for inferring semantic meaning from words [46][47][48][49][50]. The word embeddings are a key component of the NLP model and they rely on the distributional hypothesis which states "a word is characterized by the company it keeps".…”
Section: Analysis Of Datamentioning
confidence: 99%
“…Text Normalization: Lemmatization and stemming are standard normalization techniques employed in order to mitigate the noise produced by grammatical inflections in a variety of natural language processing tasks. However, in continuous vector space models, this type of normalization could lead to information loss as inflections may capture relational analogies, e.g., nominal plural analogies, such as "dog is to dogs what horse is to horses" [40]. Inflection phenomena are not equally pervasive in programming scripts; still source code identifiers do incorporate aspects of inflection, e.g., a class named "Node" versus a collection which is named "nodes" and holds instances of "Node" objects.…”
Section: Data Collection and Preprocessingmentioning
confidence: 99%