Proceedings of the 10th International Conference on World Wide Web 2001
DOI: 10.1145/371920.372094
|View full text |Cite
|
Sign up to set email alerts
|

Placing search in context

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
80
0

Year Published

2004
2004
2020
2020

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 604 publications
(80 citation statements)
references
References 6 publications
0
80
0
Order By: Relevance
“…First we consider a group of word-level similarity datasets that are commonly used as benchmarks in previous research: WS-353-SIM (Finkelstein et al, 2001), YP-130 (Yang and Powers, 2005), SIMLEX-999 (Hill et al, 2015), SimVerb-3500 (Gerz et al, 2016), RW-STANFORD (Luong Table 1: Spearman's ρ on word similarity tasks for combinations of word vectors and the following similarity metrics: cosine similarity (COS), Pearson's r (PRS), Spearman's ρ (SPR), and Kendall τ (KEN). N indicates the proportion of sentence vectors in a task for which the null hypothesis of normality in a Shapiro-Wilk test was not rejected at α = 0.05.…”
Section: Methodsmentioning
confidence: 99%
“…First we consider a group of word-level similarity datasets that are commonly used as benchmarks in previous research: WS-353-SIM (Finkelstein et al, 2001), YP-130 (Yang and Powers, 2005), SIMLEX-999 (Hill et al, 2015), SimVerb-3500 (Gerz et al, 2016), RW-STANFORD (Luong Table 1: Spearman's ρ on word similarity tasks for combinations of word vectors and the following similarity metrics: cosine similarity (COS), Pearson's r (PRS), Spearman's ρ (SPR), and Kendall τ (KEN). N indicates the proportion of sentence vectors in a task for which the null hypothesis of normality in a Shapiro-Wilk test was not rejected at α = 0.05.…”
Section: Methodsmentioning
confidence: 99%
“…The correlation between the human ratings and similarity scores computed using word embeddings for pairs of words has been used as a measure of the quality of the word embeddings (Mikolov et al, 2013d). We compute cosine similarity between word embeddings and measure Spearman correlation against human rat-ings for the word-pairs in the following benchmark datasets: Word Similarity 353 dataset (WS) (Finkelstein et al, 2001), Rubenstein-Goodenough dataset (RG) (Rubenstein and Goodenough, 1965), MTurk (Halawi et al, 2012), rare words dataset (RW) (Luong et al, 2013), MEN dataset (Bruni et al, 2012) and SimLex dataset (Hill et al, 2015). Unfortunately, existing benchmark datasets for semantic similarity were not created considering gender-biases and contain many stereotypical examples.…”
Section: Semantic Similarity Measurementmentioning
confidence: 99%
“…Indeed, using such unsupervised evaluations based on cosine similarity and Pearson correlation (which we denote UCP) may be misleading, as pointed out by Lu et al (2015). When they normalized word embeddings, their WS353 semantic similarity (Finkelstein et al, 2001) scores using UCP increased by almost 20 percentage points (pp). Since normalization is a simple operation that could easily be learned by a machine learning model, this indicates that UCP scores may yield unreliable conclusions regarding the quality of the underlying embeddings.…”
Section: Sentence Encodermentioning
confidence: 99%