2019
DOI: 10.1609/aaai.v33i01.33016562
|View full text |Cite
|
Sign up to set email alerts
|

Word Embedding as Maximum A Posteriori Estimation

Abstract: The GloVe word embedding model relies on solving a global optimization problem, which can be reformulated as a maximum likelihood estimation problem. In this paper, we propose to generalize this approach to word embedding by considering parametrized variants of the GloVe model and incorporating priors on these parameters. To demonstrate the usefulness of this approach, we consider a word embedding model in which each context word is associated with a corresponding variance, intuitively encoding how informative… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 28 publications
0
10
0
Order By: Relevance
“…In this section, we describe the word embedding results, where we directly compare our model with the following baselines: GloVe (Pennington et al, 2014), Skipgram (Mikolov et al, 2013b) (denoted as SG), Continuous Bag of Words (Mikolov et al, 2013b) (denoted as CBOW), and the recently proposed WeMAP model (Jameel et al, 2019). We have used the Wikipedia dataset which was shared by Jameel et al (2019), using the same vocabulary and preprocessing strategy. We report results for 300-dimensional word vectors and we use K = 3000 mixture components for our model.…”
Section: Word Embedding Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…In this section, we describe the word embedding results, where we directly compare our model with the following baselines: GloVe (Pennington et al, 2014), Skipgram (Mikolov et al, 2013b) (denoted as SG), Continuous Bag of Words (Mikolov et al, 2013b) (denoted as CBOW), and the recently proposed WeMAP model (Jameel et al, 2019). We have used the Wikipedia dataset which was shared by Jameel et al (2019), using the same vocabulary and preprocessing strategy. We report results for 300-dimensional word vectors and we use K = 3000 mixture components for our model.…”
Section: Word Embedding Resultsmentioning
confidence: 99%
“…(sHDP) 1314 , 7) GloVe 15 (Pennington et al, 2014), 8) WeMAP (Jameel et al, 2019), 9) Skipgram (SG) and Continuous Bag-of-Words 16 (Mikolov et al, 2013b) models. In the case of the word embedding models, we create document vectors in the same way as we do for our model, by simply replacing the role of target word vectors with document word vectors.…”
Section: Document Embedding Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…It plays an essential role in various practical scenarios where there exist hidden variables or uncertainty. Some applications include image processing [3], [4], text analysis [5]- [7], recommender system [8], protein design and protein side-chain prediction problems [9], [10]. Adding the prior probability information reduces the overdependence on the observed data for parameter estimation, MAP estimation be seen as a regularization of Maximum Likelihood Estimation (MLE), MAP can deal well with low training data.…”
Section: Introductionmentioning
confidence: 99%