Proceedings of the Web Conference 2020 2020
DOI: 10.1145/3366423.3380163
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Abstract: For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each m… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 43 publications
(23 citation statements)
references
References 27 publications
(43 reference statements)
1
22
0
Order By: Relevance
“…In the recommendation system, the platform calculates CTR through big data. The CTR of recommended products can be calculated through big data and compared with a typical click-through rate: if the CTR is bigger than 0.2%, the recommendation is more effective [22]. The platform can find personalized recommendations suitable for consumers through CTR value.…”
Section: 2description Of C-terminalmentioning
confidence: 99%
“…In the recommendation system, the platform calculates CTR through big data. The CTR of recommended products can be calculated through big data and compared with a typical click-through rate: if the CTR is bigger than 0.2%, the recommendation is more effective [22]. The platform can find personalized recommendations suitable for consumers through CTR value.…”
Section: 2description Of C-terminalmentioning
confidence: 99%
“…Although early fusion methods have low computational complexity, the existence of redundancy reduces the effectiveness of information. The other type of fusion is late fusion (score-level fusion) [10,11]. The extracted representations or unimodal results are fused at the late stage.…”
Section: Introductionmentioning
confidence: 99%
“…Most existing methods are adopted from the image-text embedding methods, which focus on the visual representation of videos. Some researchers [4,5,7,16,31,32,32,35,40,42,43] struggle to find a representative video frame, and then feed it into the imagetext model for video-text retrieval. However, other rich information in the videos effective for video-text retrieval is ignored.…”
Section: Introductionmentioning
confidence: 99%