Multimodal Recommender Systems: A Survey

Liu, Qidong; Hu, Junbo; Xiao, Yu‐Tian; Gao, Jingtong; Zhao, Xiangyu

doi:10.48550/arxiv.2302.03883

Cited by 2 publications

(1 citation statement)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, existing methods [14] of analyzing different types of side information by extracting features from different modalities by deep learning models or combining the multi-modal and user-item bipartite graph have not explored deeply about the relationship between user preference and side information. Even though the user-item bipartite graph contains the interaction data, this type of method does not contain the implicit relationships between interactions and side information.…”

Section: Introductionmentioning

confidence: 99%

Multi-task Multi-modal Graph Neural Network for Recommender System

Jiao,

Zhang,

Hara

2024

Preprint

View full text Add to dashboard Cite

With the explosive growth of online information, users may also face information overload. To handle this problem, recommender systems have become an effective strategy, which can analyze the characters of users and items to provide valuable information. One of the important types of information is the item’s side information. For example, in Amazon dataset, side information mainly includes visual side information (e.g., image and video), textual side information (e.g., title and description), and auxiliary side information (e.g., brand and category). To analyze various types of side information, some research designed multiple modalities for different types of side information, which can improve the performance of the recommender system. To analyze the deeper relationships between users and items, recent works also use a graph structure to represent the interactions. Existing works on multi-modal recommender systems using graph neural networks largely depend on the interaction records, while little effort focuses on the relationships between interactions and various types of side information. In this paper, we propose a novel multi-task learning model. First, we construct the interaction records to graphs for each modality to gather the representations, and then we analyze the representations of each modality and the specific side information based on the similarities. We design a Multi-task Multi-modal Graph Neural Network (MTMM-GNN) framework built upon message-passing with the attention mechanism of graph neural networks, which can generate the representations of users and items from interaction records, and then analyze the relationships between the representations from GNNs and item’s side information. We conduct experiments on two public datasets, Amazon and MovieLens, and the results of our model outperform the state-of-the-art methods.

show abstract

Section: Introductionmentioning

confidence: 99%