A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models

Dallmann, Alexander; Zöller, Daniel; Hotho, Andreas

doi:10.1145/3460231.3475943

“…2, we note that general magnitudes of the reported effectiveness results are smaller than those reported in [42] indeed, as stated in Section 5.4, in contrast to [42], we follow recent advice [5,20] to avoid sampled metrics, instead preferring the more accurate unsampled metrics. The magnitudes of effectiveness reported for MovieLens-20M are in line with those reported by [9] (e.g. a Recall@10 of 0.137 for SASRec-vanilla is reported in [9] when also using a Leave-One-Out evaluation scheme and unsampled metrics).…”

Section: Data Splitting and Evaluation Measuressupporting

confidence: 81%

Effective and Efficient Training for Sequential Recommendation using Recency Sampling

Petrov

¹

,

Macdonald

²

2022

Proceedings of the 16th ACM Conference on Recommender Systems

View full text Add to dashboard Cite

Many modern sequential recommender systems use deep neural networks, which can effectively estimate the relevance of items but require a lot of time to train. Slow training increases expenses, hinders product development timescales and prevents the model from being regularly updated to adapt to changing user preferences. Training such sequential models involves appropriately sampling past user interactions to create a realistic training objective. The existing training objectives have limitations. For instance, next item prediction never uses the beginning of the sequence as a learning target, thereby potentially discarding valuable data. On the other hand, the item masking used by BERT4Rec is only weakly related to the goal of the sequential recommendation; therefore, it requires much more time to obtain an effective model. Hence, we propose a novel Recency-based Sampling of Sequences training objective that addresses both limitations. We apply our method to various recent and state-of-the-art model architectures -such as GRU4Rec, Caser, and SASRec. We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec, but with much less training time.

show abstract

“…On first inspection of Table 2, we note that general magnitudes of the reported effectiveness results are smaller than those reported in [42] -indeed, as stated in Section 5.4, in contrast to [42], we follow recent advice [5,20] to avoid sampled metrics, instead preferring the more accurate unsampled metrics. The magnitudes of effectiveness reported for MovieLens-20M are in line with those reported by [9] (e.g. a Recall@10 of 0.137 for SASRec-vanilla is reported in [9] when also using a Leave-One-Out evaluation scheme and unsampled metrics).…”

Section: Rq1 Benefit Of Recency Samplingsupporting

confidence: 81%

“…The magnitudes of effectiveness reported for MovieLens-20M are in line with those reported by [9] (e.g. a Recall@10 of 0.137 for SASRec-vanilla is reported in [9] when also using a Leave-One-Out evaluation scheme and unsampled metrics).…”

Section: Rq1 Benefit Of Recency Samplingsupporting

confidence: 81%

Effective and Efficient Training for Sequential Recommendation using Recency Sampling

Petrov¹,

Macdonald²

2022

Preprint

1

0

View full text Add to dashboard Cite

Many modern sequential recommender systems use deep neural networks, which can effectively estimate the relevance of items but require a lot of time to train. Slow training increases expenses, hinders product development timescales and prevents the model from being regularly updated to adapt to changing user preferences. Training such sequential models involves appropriately sampling past user interactions to create a realistic training objective. The existing training objectives have limitations. For instance, next item prediction never uses the beginning of the sequence as a learning target, thereby potentially discarding valuable data. On the other hand, the item masking used by BERT4Rec is only weakly related to the goal of the sequential recommendation; therefore, it requires much more time to obtain an effective model. Hence, we propose a novel Recency-based Sampling of Sequences training objective that addresses both limitations. We apply our method to various recent and state-of-the-art model architectures -such as GRU4Rec, Caser, and SASRec. We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec, but with much less training time.

show abstract

“…We use an adaptation of the original code for this model. 9 For SASRec we set sequence length to 50, embedding size to 50 and use 2 transformer blocks; according to the experiments conducted by Kang et al [24], these parameters are within the range where SASRec shows reasonable performance.…”

Section: Modelsmentioning

confidence: 99%

A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation

Petrov

¹

,

Macdonald

²

2022

Proceedings of the 16th ACM Conference on Recommender Systems

View full text Add to dashboard Cite

BERT4Rec is an effective model for sequential recommendation based on the Transformer architecture. In the original publication, BERT4Rec claimed superiority over other available sequential recommendation approaches (e.g. SASRec), and it is now frequently being used as a state-of-the art baseline for sequential recommendation. However, not all subsequent publications confirmed its superiority and have proposed other models that were shown to outperform BERT4Rec in effectiveness. In this paper we systematically review all publications that compare BERT4Rec with another popular Transformer-based model, namely SASRec, and show that BERT4Rec results are not consistent within these publications. To understand the reasons behind this inconsistency, we analyse the available implementations of BERT4Rec and show that we fail to reproduce results of the original BERT4Rec publication when using their default configuration parameters. However, we are able to replicate the reported results with the original code if training for a much longer amount of time (up to 30x) compared to the default configuration. We also propose our own implementation of BERT4Rec based on the HuggingFace Transformers library, which we demonstrate replicates the originally reported results on 3 out 4 datasets, while requiring up to 95% less training time to converge. Overall, from our systematic review and detailed experiments, we conclude that BERT4Rec does indeed exhibit state-of-the-art effectiveness for sequential recommendation, but only when trained for a sufficient amount of time. Additionally, we show that our implementation can further benefit from adapting other Transformer architectures that are available in the HuggingFace Transformers library (e.g. using disentangled attention, as provided by DeBERTa, or larger hidden layer size cf. ALBERT). For example, on the MovieLens-1M dataset, we demonstrate that both these models can improve BERT4Rec performance by up to 9%. Moreover, we show that an ALBERT-based BERT4Rec model achieves better performance on that dataset than state-of-the-art results reported in the most recent publications.

show abstract

A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models

Cited by 39 publications

References 31 publications

Effective and Efficient Training for Sequential Recommendation using Recency Sampling

Effective and Efficient Training for Sequential Recommendation using Recency Sampling

Effective and Efficient Training for Sequential Recommendation using Recency Sampling

A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation

Contact Info

Product

Resources

About