2019
DOI: 10.48550/arxiv.1905.10630
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Abstract: In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(19 citation statements)
references
References 19 publications
0
19
0
Order By: Relevance
“…[16] found that adding additional personalized embeddings did not improve the performance of their Transformer model, and postulate that this is due to the fact that they already use the user history and the embeddings only contribute to overfitting. Although introducing user embeddings into the model is indeed difficult with existing regularization techniques for embeddings, we show that personalization can greatly improve ranking performances with recent regularization technique called Stochastic Shared Embeddings (SSE) [41]. The personalized Transformer (SSE-PT) model with SSE regularization works well for all 5 realworld datasets we consider, outperforming previous state-of-the-art algorithm SASRec by almost 5% in terms of NDCG@10.…”
Section: Introductionmentioning
confidence: 90%
See 4 more Smart Citations
“…[16] found that adding additional personalized embeddings did not improve the performance of their Transformer model, and postulate that this is due to the fact that they already use the user history and the embeddings only contribute to overfitting. Although introducing user embeddings into the model is indeed difficult with existing regularization techniques for embeddings, we show that personalization can greatly improve ranking performances with recent regularization technique called Stochastic Shared Embeddings (SSE) [41]. The personalized Transformer (SSE-PT) model with SSE regularization works well for all 5 realworld datasets we consider, outperforming previous state-of-the-art algorithm SASRec by almost 5% in terms of NDCG@10.…”
Section: Introductionmentioning
confidence: 90%
“…There are many other regularization techniques, including parameter sharing [5], max-norm regularization [31], gradient clipping [25], etc. Very recently, a new regularization technique called Stochastic Shared Embeddings (SSE) [41] is proposed as a new means of regularizing embedding layers. [41] develops two versions of SSE, SSE-Graph and SSE-SE.…”
Section: Regularization Techniquesmentioning
confidence: 99%
See 3 more Smart Citations