Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018
DOI: 10.18653/v1/p18-1172
|View full text |Cite
|
Sign up to set email alerts
|

Batch IS NOT Heavy: Learning Word Representations From All Samples

Abstract: Stochastic Gradient Descent (SGD) with negative sampling is the most prevalent approach to learn word representations. However, it is known that sampling methods are biased especially when the sampling distribution deviates from the true data distribution. Besides, SGD suffers from dramatic fluctuation due to the onesample learning scheme. In this work, we propose AllVec that uses batch gradient learning to generate word representations from all training samples. Remarkably, the time complexity of AllVec remai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
16
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(20 citation statements)
references
References 28 publications
4
16
0
Order By: Relevance
“…For example, in Table 2, the performances of ExpoMF are better than BPR; and our EHCF outperforms all the baselines. This is consistent with previous work (Yuan et al 2018;Xin et al 2018), which indicates that regardless of what sampler is utilized or how many updates are taken, sampling is still a biased approach. 3) Our EHCF significantly outperforms the state-of-the-art CF methods in both traditional (single-behavior) and heterogeneous scenarios.…”
Section: Performance Comparisonsupporting
confidence: 92%
See 1 more Smart Citation
“…For example, in Table 2, the performances of ExpoMF are better than BPR; and our EHCF outperforms all the baselines. This is consistent with previous work (Yuan et al 2018;Xin et al 2018), which indicates that regardless of what sampler is utilized or how many updates are taken, sampling is still a biased approach. 3) Our EHCF significantly outperforms the state-of-the-art CF methods in both traditional (single-behavior) and heterogeneous scenarios.…”
Section: Performance Comparisonsupporting
confidence: 92%
“…While previous studies (He et al 2017;Wang et al 2018;Chen et al 2019b) have shown that the performance of NS is not robust as it is highly sensitive to the sampling distribution and the number of negative samples. Essentially, sampling is biased, making it difficult to converge to the optimal ranking performance regardless of how many update steps have been taken (Xin et al 2018). Besides, to leverage heterogeneous user behavior, NS strategy needs to sample a negative instance for every observed interaction (regardless of the behavior type), which produces a very large randomness in total (about K times than singlebehavior scenario where K is the number of behavior types).…”
Section: Introductionmentioning
confidence: 99%
“…For example, in Table 4, the performances of WMF and ExpoMF are better than BPR; and our ENMF outperforms BPR, GMF, NCF and Con-vNCF. This is consistent with previous work [4,57,60], which indicates that regardless of what sampler is utilized or how many updates are taken, sampling is still a biased approach. (3) The results show that neural methods generally performs better than traditional collaborative filtering methods, which verifies the advantages of neural networks over traditional models in representation learning.…”
Section: Performance Comparisonsupporting
confidence: 92%
“…By negative sampling, the number of negative instances is greatly reduced, therefore the overall time complexity is controllable [23]. However, the downside is that sampling-based methods usually have a slower convergence rate and the performance is highly dependent of the design of the sampler [25,57]. Whole-data-based strategy [11,26,30,31] sees all the missing data as negative.…”
Section: Model Learning In Recommendationmentioning
confidence: 99%
See 1 more Smart Citation