Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formulate each user's preferences as a triplet loss by using Likes as metadata as well as the text-and image-related features of posts. Furthermore, we develop a personalized multivariational autoencoder (PMVAE) by introducing a triplet loss into multivariational autoencoder (MVAE), which is among the most effective methods of multimodal feature extraction. Previously proposed variants of MVAE can project multiple kinds of features into the single latent features. However, because the latent features do not reflect each user's preferences, retrieval performance when using the previous methods is limited. On the contrary, our PMVAE can extract relationships between text-and imagerelated features of posts by considering class-related information that represents whether a user prefers a given post. As a result, user-centric multimodal features, which separate a post that a user prefer and a post that a user does not prefer in the latent feature space, can be obtained. Because user-centric multimodal features have high discriminating power, the personalized retrieval of posts desired by each user becomes feasible by using them in such retrieval algorithms as the k-nearest neighbors and Annoy, which is a technique for approximate nearest neighbor search. We conduct experiments using 10 users and 150,947 contents, to verify the performance of k-NN and Annoy. The results show that our PMVAE increased normalized discounted cumulative gain (nDCG) compared with existing methods. The nDCG becomes 0.253 when using term frequency-inverse document frequency based text features and our end-to-end image features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.