Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475709
|View full text |Cite
|
Sign up to set email alerts
|

Pre-training Graph Transformer with Multimodal Side Information for Recommendation

Abstract: Side information of items, e.g., images and text description, has shown to be effective in contributing to accurate recommendations. Inspired by the recent success of pre-training models on natural language and images, we propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We relate items by common user activities, e.g., co-purchase, and construct a homogeneous item graph. This graph provides a unified view of item relations and their… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 70 publications
(47 citation statements)
references
References 37 publications
0
47
0
Order By: Relevance
“…To verify the effectiveness of our proposed pre-training model, we compare it with the following representative baseline methods: Random, CLIP [17], MMGCN [29], GPT-GNN [8], Graph-BERT [31], PMGT [12], and GCN-P [14]. A short description of all baselines is given in Appendix B.…”
Section: Baseline Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…To verify the effectiveness of our proposed pre-training model, we compare it with the following representative baseline methods: Random, CLIP [17], MMGCN [29], GPT-GNN [8], Graph-BERT [31], PMGT [12], and GCN-P [14]. A short description of all baselines is given in Appendix B.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…However, it ignores the masking operations on the nodes, which may limit the ability to aggregate the features of different nodes. To address this problem, PMGT [12] designs a masked node feature reconstruction task, which aims to reconstruct the features of masked nodes by other non-masked nodes so that it improves the recommendation performance. Unlike these existing methods, our proposed multi-modal contrastive pre-training method aims to integrate the multi-modalities information both on the user side and item side, capture modality-specific features and aggregate cross-modality information from both users and items.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations