Word Embedding as Maximum A Posteriori Estimation

Jameel, Shoaib; Fu, Zihao; Shi, Bei; Lam, Wai; Schockaert, Steven

doi:10.1609/aaai.v33i01.33016562

Cited by 5 publications

(10 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we describe the word embedding results, where we directly compare our model with the following baselines: GloVe (Pennington et al, 2014), Skipgram (Mikolov et al, 2013b) (denoted as SG), Continuous Bag of Words (Mikolov et al, 2013b) (denoted as CBOW), and the recently proposed WeMAP model (Jameel et al, 2019). We have used the Wikipedia dataset which was shared by Jameel et al (2019), using the same vocabulary and preprocessing strategy. We report results for 300-dimensional word vectors and we use K = 3000 mixture components for our model.…”

Section: Word Embedding Resultsmentioning

confidence: 99%

“…(sHDP) 1314 , 7) GloVe 15 (Pennington et al, 2014), 8) WeMAP (Jameel et al, 2019), 9) Skipgram (SG) and Continuous Bag-of-Words 16 (Mikolov et al, 2013b) models. In the case of the word embedding models, we create document vectors in the same way as we do for our model, by simply replacing the role of target word vectors with document word vectors.…”

Section: Document Embedding Resultsmentioning

confidence: 99%

“…The advantage of this probabilistic formulation is that it allows us to introduce priors on the parameters of the model. This strategy was recently used in the WeMAP model (Jameel et al, 2019) to replace the constant variance σ 2 by a variance σ 2 j that depends on the context word. In this paper, however, we will use priors on the parameters of the word embedding model itself.…”

Section: Model Descriptionmentioning

confidence: 99%

“…, where we have considered the same datasets asJameel et al (2019).In the table, we refer to EN-RW-Stanford as Stanf, EN-SIMLEX-999 as LEX, SimVerb3500 as Verb, EN-MTurk771 as Tr771, EN-MTurk287 as Tr287, EN-MENTR3K as TR3k, the RareWords dataset as RW, and the recently introduced Card-660 rare words dataset (Pilehvar et al, 2018) denoted as CA-660. Note that we have removed multi-word expressions from the RW-660 dataset and consider only unigrams, which reduces the size of 62.81 53.04 55.21 14.82 10.56 0.881 SG 71.58 60.50 51.71 55.45 13.48 08.78 0.671 CBOW 64.81 47.39 45.33 50.58 10.11 07.02 0.764 WeMAP 83.52 63.08 55.08 56.03 14.95 10.62 0.903 CvMF 63.22 67.41 63.21 65.94 17.46 9.380 1.100 CvMF(NIG) 64.14 67.55 63.55 65.95 17.49 9.410 1.210 Word analogy accuracy results on different datasets.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Jameel¹,

Schockaert²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Word embedding models typically learn two types of vectors: target word vectors and context word vectors. These vectors are normally learned such that they are predictive of some word co-occurrence statistic, but they are otherwise unconstrained. However, the words from a given language can be organized in various natural groupings, such as syntactic word classes (e.g. nouns, adjectives, verbs) and semantic themes (e.g. sports, politics, sentiment). Our hypothesis in this paper is that embedding models can be improved by explicitly imposing a cluster structure on the set of context word vectors. To this end, our model relies on the assumption that context word vectors are drawn from a mixture of von Mises-Fisher (vMF) distributions, where the parameters of this mixture distribution are jointly optimized with the word vectors. We show that this results in word vectors which are qualitatively different from those obtained with existing word embedding models. We furthermore show that our embedding model can also be used to learn high-quality document representations.

show abstract

Section: Word Embedding Resultsmentioning

confidence: 99%

Section: Document Embedding Resultsmentioning

confidence: 99%

Section: Model Descriptionmentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Jameel¹,

Schockaert²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…It plays an essential role in various practical scenarios where there exist hidden variables or uncertainty. Some applications include image processing [3], [4], text analysis [5]- [7], recommender system [8], protein design and protein side-chain prediction problems [9], [10]. Adding the prior probability information reduces the overdependence on the observed data for parameter estimation, MAP estimation be seen as a regularization of Maximum Likelihood Estimation (MLE), MAP can deal well with low training data.…”

Section: Introductionmentioning

confidence: 99%

MAP Estimation With Bernoulli Randomness, and Its Application to Text Analysis and Recommender Systems

Bui

Nguyen

et al. 2020

IEEE Access

View full text Add to dashboard Cite

MAP estimation plays an important role in many probabilistic models. However, in many cases, the MAP problem is non-convex and intractable. In this work, we propose a novel algorithm, called BOPE, which uses Bernoulli randomness for Online Maximum a Posteriori Estimation. We show that BOPE has a fast convergence rate. In particular, BOPE implicitly employs a prior which plays as regularization. Such a prior is different from the one of the MAP problem and will be vanishing as BOPE does more iterations. This property of BOPE is significant and enables to reduce severe overfitting for probabilistic models in ill-posed cases, including short text, sparse data, and noisy data. We validate the practical efficiency of BOPE in two contexts: text analysis and recommender systems. Both contexts show the superior of BOPE over the baselines.

show abstract

GeSe: Generalized static embedding

Gong

Yao

2022

Appl Intell

View full text Add to dashboard Cite

Word Embedding as Maximum A Posteriori Estimation

Cited by 5 publications

References 28 publications

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

MAP Estimation With Bernoulli Randomness, and Its Application to Text Analysis and Recommender Systems

GeSe: Generalized static embedding

Contact Info

Product

Resources

About