Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data 2019
DOI: 10.1145/3326937.3341261
|View full text |Cite
|
Sign up to set email alerts
|

Behavior sequence transformer for e-commerce recommendation in Alibaba

Abstract: Deep learning based methods have been widely used in industrial recommendation systems (RSs). Previous works adopt an Em-bedding&MLP paradigm: raw features are embedded into lowdimensional vectors, which are then fed on to MLP for final recommendations. However, most of these works just concatenate different features, ignoring the sequential nature of users' behaviors. In this paper, we propose to use the powerful Transformer model to capture the sequential signals underlying users' behavior sequences for reco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
167
0
5

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 304 publications
(173 citation statements)
references
References 14 publications
1
167
0
5
Order By: Relevance
“…Our second approach to implement temporal dynamics is a DL-based approach. We make use of the Transformer architecture [4] in this model for two prominent tasks: to capture the sequential reading behavior of a news reader and to perform the next click prediction. We build separate reader and news components in our proposed framework.…”
Section: Preliminary Resultsmentioning
confidence: 99%
“…Our second approach to implement temporal dynamics is a DL-based approach. We make use of the Transformer architecture [4] in this model for two prominent tasks: to capture the sequential reading behavior of a news reader and to perform the next click prediction. We build separate reader and news components in our proposed framework.…”
Section: Preliminary Resultsmentioning
confidence: 99%
“…Zhu et al [17] proposed an improved long short term memory (LSTM) method to learn the correlation between users' adjacent behaviors, which can predict users' short-term and long-term interests. Chen et al [18] proposed a transformer model based on attention mechanism to extract the features of user behavior sequence, which can be used to predict user preferences for the product. Zhong et al [19] proposed multiple aspect attentive graph neural networks to extract user social network features, which can be used to generate user geographic information tag.…”
Section: Related Workmentioning
confidence: 99%
“…Self-attention blocks. Self-attention [30], which is an attention mechanism relating different positions of a single sequence in order to compute a new representation of the sequence, has achieved stateof-the-art performance for sequence modeling in many tasks [3,30]. An attention function mappings a query and a set of key-value pairs to an output, which is a weighted sum of the values, where the weight assigned to each value is computed based on the query and corresponding key.…”
Section: Deep Interestmentioning
confidence: 99%