Image based platforms are popular in recent years. With a large number of images in these image based platforms, how to properly recommend images that suit each user's interest is a key problem for recommender systems. While a simple idea is to adopt collaborative filtering for image recommendation, it does not fully utilize the visual information and suffers from the data sparsity issue. Recently, with the huge success of Convolutional Neural Networks (CNN) for image analysis, some researchers proposed to leverage image content information for recommendation. Specifically, Visual Bayesian Personalized Ranking (VBPR) (He and McAuley, in: The association for the advancement of artificial intelligence, 2016) is a state-ofthe-art visual based recommendation model, which proposed to learn users' preferences to items from two spaces: a visual content space learned from CNNs, and a latent space learned from classical collaborative filtering models. VBPR and its variants showed better recommendation performance with image content modeling. In the real-world, when browsing visual images, users not only care the image content, but also concern the matching degree of the image style. Compared to image content, the role of visual styles has been largely ignored in the image recommendation community. Therefore, in this paper, we study the problem of learning both the visual content and style for image recommendation. We leverage advances in computer vision to learn the visual content and style representation, and propose to how to combine visual signals with users' collaborative data. Finally, experimental results on a real-world dataset clearly show the effectiveness of our proposed model.