Personalized rating prediction is an important research problem in recommender systems. Although the latent factor model (e.g., matrix factorization) achieves good accuracy in rating prediction, it suffers from many problems including cold-start, non-transparency, and suboptimal results for individual user-item pairs. In this paper, we exploit textual reviews and item images together with ratings to tackle these limitations. Specifically, we first apply a proposed multi-modal aspect-aware topic model (MATM) on text reviews and item images to model users' preferences and items' features from different aspects, and also estimate the aspect importance of a user towards an item. Then the aspect importance is integrated into a novel aspect-aware latent factor model (ALFM), which learns user's and item's latent factors based on ratings. In particular, ALFM introduces a weight matrix to associate those latent factors with the same set of aspects in MATM, such that the latent factors could be used to estimate aspect ratings. Finally, the overall rating is computed via a linear combination of the aspect ratings, which are weighted by the corresponding aspect importance. To this end, our model could alleviate the data sparsity problem and gain good interpretability for recommendation. Besides, every aspect rating is weighted by its aspect importance, which is dependent on the targeted user's preferences and the targeted item's features. Therefore, it is expected that the proposed method can model a user's preferences on an item more accurately for each user-item pair. Comprehensive experimental studies have been conducted on the Yelp 2017 Challenge dataset and Amazon product datasets. Results show that (1) our method achieves significant improvement compared to strong baseline methods, especially for users with only few ratings; (2) item visual features can improve the prediction performance -the effects of item image features on improving the prediction results depend on the importance of the visual features for the items; and (3) our model can explicitly interpret the predicted results in great detail.
1:3restaurant; while for a cheap restaurant, the expectation on these two aspects would be low. Thus, the user will give higher weights to the aspects of "service" and "ambience" for the expensive restaurant than the cheap one when rating two such restaurants. Therefore, for accurate prediction, it is important to accurately capture the importance of each latent factor for a user towards an item. At first glance, MF achieves the goal as the influence of a factor (e.g., k-th factor) is dependent on both p u,k and q i,k (i.e., p u,k * q i,k ). However, it models the importance of a factor by a fixed value for an item or a user. As a result, it treats each factor of an item with the same importance to all users (i.e., q i,k ); and similarly, each factor of a user is equally important to all items (i.e., p u,k ) in rating prediction. This will lead to sub-optimal results for individual user-item pair.In this work, we attempt t...