WordRank: Learning Word Embeddings via Robust Ranking

Ji, Shihao; Yun, Hyokun; Yanardag, Pinar; Matsushima, Shin; Vishwanathan, S. V. N.

doi:10.18653/v1/d16-1063

Cited by 26 publications

(12 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…• CBOW-a [Chen et al, 2017]: a CBOW variant which adaptively sample negative words by ranking scores. • WordRank [Ji et al, 2015]: a ranking model that puts more weights on positive words by rank values. • OptRank: our ranking model with the optimization in both positive word ranking and negative word sampling.…”

Section: Comparison Methodsmentioning

confidence: 99%

“…We set ε as 0.5 in five subsets and 1.0 in Wiki2017(14G). For the WordRank model, we adopt the settings given by [Ji et al, 2015]: logarithm as the objective function, initial value of scale parameter is α = 100 and offset parameter β = 99. The dimension of word vectors is also set to 300.…”

Section: Parameter Settingsmentioning

confidence: 99%

“…An adaptive sampler [Chen et al, 2017] has been proposed to roughly select the negative words which have larger inner products with contextual words than positive words, but it does not take care of the issue of positive word weighing. The WordRank model [Ji et al, 2015] proposes to treat the word embedding as a ranking problem. The similarity is computed between the contextual words and a positive word and then fed into a ranking function, the result of which is adopted as the weights of positive words.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Approximating Word Ranking and Negative Sampling for Word Embedding

Guo

Ouyang

Yuan

et al. 2018

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

CBOW (Continuous Bag-Of-Words) is one of the most commonly used techniques to generate word embeddings in various NLP tasks. However, it fails to reach the optimal performance due to uniform involvements of positive words and a simple sampling distribution of negative words. To resolve these issues, we propose OptRank to optimize word ranking and approximate negative sampling for bettering word embedding. Specifically, we first formalize word embedding as a ranking problem. Then, we weigh the positive words by their ranks such that highly ranked words have more importance, and adopt a dynamic sampling strategy to select informative negative words. In addition, an approximation method is designed to efficiently compute word ranks. Empirical experiments show that OptRank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https://github.com/ouououououou/OptRank.

show abstract

Section: Comparison Methodsmentioning

confidence: 99%

Section: Parameter Settingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Approximating Word Ranking and Negative Sampling for Word Embedding

Guo

Ouyang

Yuan

et al. 2018

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…Das aktuellste Word-Embedding mit Namen BERT (1032 Zitationen), ist zum Zeitpunkt der Erstellung des Beitrages von keiner frei zugänglichen Bibliothek bereitgestellt worden und ist demnach nicht für ein Benchmarking geeignet. [10], [13], [16], [17], [18], [19], [20], [5], [6], [14] Abbildung…”

Section: Relevanz Verschiedener Word-embeddingsunclassified

Word-Embedding Benchmarking

Behnen¹,

Kruse²,

Gómez³

2020

WI2020 Zentrale Tracks

View full text Add to dashboard Cite

Die automatisierte Verarbeitung natürlicher Sprache bietet großes Potenzial, um neues Wissen aus unstrukturierten Daten zu gewinnen. Für die Bewältigung dieser Herausforderung haben sich in den vergangenen Jahren Word-Embedding-Verfahren bewährt, da diese semantische Ähnlichkeiten abbilden können. Durch die intensive Forschung entstehen stetig neue Verfahren. Je nach Aufgabe ist dabei aber nicht immer eindeutig, welches Verfahren für den Einzelfall empfehlenswert ist. Um dieses Problem zu adressieren, liefert dieser Beitrag einen Word-Embedding Benchmark, der die NLP-Tasks Ähnlichkeit, Analogie und (Multiclass-/Multilabel-) Klassifikation für einheitlich trainierte Word-Embeddings betrachtet und neben der Genauigkeit auch auf Kriterien wie die Trainingsdauer und die Hardwareressourcen eingeht. Zudem werden die Standardparametereinstellungen der Word-Embeddings modifiziert, um die Ergebnisse zu validieren. Die besten Ergebnisse lieferten die Word-Embeddings FastText und ELMo. Mit diesem Benchmark werden Data Scientisten bei der Auswahl eines Word-Embeddings für den Einsatz in entsprechenden NLP-Aufgaben unterstützt.

show abstract

“…In this subsection, we provide an intuitive example to explain the merits of popularity oversampling from ranking perspective. The reason is that training word embedding can also be naturally viewed as a ranking task that ranks an observed context word c P higher than any non-observed context word c N [14]. To illustrate this, we give a schematic of a ranked list for a target word w as below, where +1 and -1 denote an observed and non-observed context word respectively.…”

Section: Learning Optimal Ranking For Embeddingsmentioning

confidence: 99%

Improving Negative Sampling for Word Representation using Self-embedded Features

Chen

Yuan

Jose

et al. 2018

Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we start from an investigation of the gradient vanishing issue in the skip-gram model without a proper negative sampler. By performing an insightful analysis from the stochastic gradient descent (SGD) learning perspective, we demonstrate that, both theoretically and intuitively, negative samples with larger inner product scores are more informative than those with lower scores for the SGD learner in terms of both convergence rate and accuracy. Understanding this, we propose an alternative sampling algorithm that dynamically selects informative negative samples during each SGD update. More importantly, the proposed sampler accounts for multi-dimensional self-embedded features during the sampling process, which essentially makes it more effective than the original popularity-based (one-dimensional) sampler. Empirical experiments further verify our observations, and show that our fine-grained samplers gain significant improvement over the existing ones without increasing computational complexity.

show abstract

WordRank: Learning Word Embeddings via Robust Ranking

Cited by 26 publications

References 16 publications

Approximating Word Ranking and Negative Sampling for Word Embedding

Approximating Word Ranking and Negative Sampling for Word Embedding

Word-Embedding Benchmarking

Improving Negative Sampling for Word Representation using Self-embedded Features

Contact Info

Product

Resources

About