Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2018
DOI: 10.1145/3219819.3219823
|View full text |Cite
|
Sign up to set email alerts
|

Deep Interest Network for Click-Through Rate Prediction

Abstract: Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embed-ding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1,137
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 1,614 publications
(1,137 citation statements)
references
References 21 publications
0
1,137
0
Order By: Relevance
“…Ni et al [20] adopt LSTM and the attention mechanism to model the user behavior sequence. Compared to sequence-independent approaches, these methods can significantly improve the CTR prediction accuracy and most of these techniques have been deployed in real-world applications [20,32,38,39].…”
Section: Context Aware Personalization Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Ni et al [20] adopt LSTM and the attention mechanism to model the user behavior sequence. Compared to sequence-independent approaches, these methods can significantly improve the CTR prediction accuracy and most of these techniques have been deployed in real-world applications [20,32,38,39].…”
Section: Context Aware Personalization Modelmentioning
confidence: 99%
“…Utilizing multiple modality features is often effective to improve the performance of CTR tasks. A straightforward way [20,38,39] is to concatenate the multiple modality features, which is equivalent to giving a fixed importance weight to each modality regardless of different items. A conceivable improvement [14,31] is to dynamically distinguish the contributions of different modalities through an attention mechanism.…”
Section: Multimodal Attention Networkmentioning
confidence: 99%
“…And hereby we introduce our Interactive Attention Mechanism. Unlike the attention mechanism in [43] and [44] which uses the target item to query the interacted items sequence, we utilize dual sequences information at the same time interactively to weigh across different time slice. The attention value of each time slice β t is calculated as,…”
Section: Interactive Dual Sequence Modelingmentioning
confidence: 99%
“…The inference procedure is illustrated in Figure 4. As for the loss function, we take an end-to-end training and introduce (i) the widely used cross entropy loss L ce [25,43,44] over the whole training dataset and (ii) the parameter regularization L r . We utilize Adam algorithm for optimization.…”
Section: Final Prediction and Loss Functionsmentioning
confidence: 99%
“…Secondly, we resort to the multi-task learning with multi-modal data to handle the sparsity issue. It has become a common practice for industrial applications to leverage the useful information across related tasks to make up for the data sparsity in individual task [12,29,37]. In e-commerce, available data sources often include customer view, purchase, search, substitution records, as well as product descriptions and hierarchical category information.…”
mentioning
confidence: 99%