2015
DOI: 10.1162/coli_a_00237
|View full text |Cite
|
Sign up to set email alerts
|

SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation

Abstract: We present SimLex-999, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar (Freud, psychology) have a low rating. We show that, via this focus on similarity, SimLex-999 incentivizes the development of models with a differ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

13
1,053
1
1

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 997 publications
(1,068 citation statements)
references
References 54 publications
13
1,053
1
1
Order By: Relevance
“…1 This bag-of-words approach, however, comes with a cost. As recently shown by Hill et al (2014), despite the impressive results VSMs that take this approach obtain on modeling word association, they are much less successful in modeling word similarity. Indeed, when evaluating these VSMs with datasets such as wordsim353 (Finkelstein et al, 2001), where the word pair scores re-flect association rather than similarity (and therefore the (cup,coffee) pair is scored higher than the (car,train) pair), the Spearman correlation between their scores and the human scores often crosses the 0.7 level.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…1 This bag-of-words approach, however, comes with a cost. As recently shown by Hill et al (2014), despite the impressive results VSMs that take this approach obtain on modeling word association, they are much less successful in modeling word similarity. Indeed, when evaluating these VSMs with datasets such as wordsim353 (Finkelstein et al, 2001), where the word pair scores re-flect association rather than similarity (and therefore the (cup,coffee) pair is scored higher than the (car,train) pair), the Spearman correlation between their scores and the human scores often crosses the 0.7 level.…”
Section: Introductionmentioning
confidence: 99%
“…Indeed, when evaluating these VSMs with datasets such as wordsim353 (Finkelstein et al, 2001), where the word pair scores re-flect association rather than similarity (and therefore the (cup,coffee) pair is scored higher than the (car,train) pair), the Spearman correlation between their scores and the human scores often crosses the 0.7 level. However, when evaluating with datasets such as SimLex999 (Hill et al, 2014), where the pair scores reflect similarity, the correlation of these models with human judgment is below 0.5 (Section 6).…”
Section: Introductionmentioning
confidence: 99%
“…For word similarity evaluations, we use the WordSim-353 Similarity (WS-Sim) and Relatedness (WS-Rel) (Finkelstein et al, 2001) and SimLex-999 (SimLex) (Hill et al, 2015) datasets, and the Rare Word (RW) (Luong et al, 2013) dataset to verify if subword information improves rare word representation. Relationships are measured using the Google semantic (GSem) and syntactic (GSyn) analogies (Mikolov et al, 2013a) and the Microsoft syntactic analogies (MSR) dataset (Mikolov et al, 2013b).…”
Section: Methodsmentioning
confidence: 99%
“…They are generally used to evaluate computational models of similarity (Faruqui and Dyer 2014). However, since most of these collections are limited to pairs of nouns, with the exception of Hill et al (2015), Yang and Powers (2006) and Baker et al (2014), the coverage of verbs (and other categories) is reduced. For Spanish there is a general scarcity of these resources: there are only ratings for nouns, either collected from native speakers (Moldovan 2015), or translated from English (Finkelstein et al 2001;Camacho-Collados et al 2015).…”
Section: Background On Similaritymentioning
confidence: 99%